For the few that may not have already seen it Dr. Michael Jordan was interviewed by IEEE Spectrum recently. He offers commentary on a number of topics including computer vision, deep learning, and big data.
Overall I found the article to be an interesting read though it seems to offer little new over what he said on his AMA on Reddit.
Ultimately I find my self agreeing with his position on computer vision. Even given the major strides we have made as of late with convnets and the like we are still far from having a system as capable as we are at vision tasks. After all, the state-of-the-art challenge is the classification of just 1,000 classes of objects in high resolution images. This is a hard problem but it is something that we, humans, and many other animals do trivially.
I am a bit torn about his perspective on deep learning. Notably because of the statement "it’s largely a rebranding of neural networks." I have encountered this idea a couple of times now but I argue that it is not accurate. It is true that neural networks are a favored tool amongst those in the deep learning community and that the strides made in the DL community have been seen while using NNs. But as Bengio et al. note in their forth-coming text called Deep Learning, it "involves learning multiple levels of representation, corresponding to different levels of abstraction." Neural networks have been shown to do this but it has not been shown that they are required to perform such a task. On the flip side, they are out performing other methods that could be used.
Another point that stood out to me were is comments on the singularity. I find myself waffling on this topic and his comments help highlight the reason. Specifically he points out that discussions of the singularity are more philosophical in nature. I rather enjoy philosophy. I often say that if I had another life I would be a mathematician but if I had another one beyond that I would be a philosopher. More so than I am now anyway. I meet so many AI/ML people that think the singularity folks are just crackpots. And if we are being honest, there do seem to be more than a reasonable proportion of crackpots in the community. However that does not prevent us from approaching the topic with sound and valid argumentation. We just have to be prepared to encounter those that cannot or chose not.
Edit 2014-10-23: It appears Dr. Jordan was a bit displeased with IEEE Spectrum interview as he explains in Big Data, Hype, the Media and Other Provocative Words to Put in a Title. The long and short of it appears to be that he believes his perspective was intentionally distorted for the reason that many of my colleagues have been discussing. Namely the title, and arguably the intro, imply much stronger claims than his subsequent comments in the article seem to allude to. As such he he felt the need to clarify his perspectives.
On the one hand I though that a careful critical read of the interview allowed one to pick out his perspective fairly well. But in reading his response there appear to be some things that seem to come across just plain wrong. For instance his opinion about whether we should be collecting and exploring these large data sets. In the interview he makes the great point that we must be cognizant of bad correlations that can and will likely arise. But in the context I did get the impression that he was arguing against doing it all, i.e. collecting and analyzing such data sets, whereas in his response he argues that doing it can be a good thing because it can contribute to the development of principals that are currently missing.
As a side note, I find it interesting that he did not link to the interview but instead gave a link to it. As if to say, let's not lend any more credibility to this article than is absolutely necessary.
Musings on artificial intelligence, machine learning, robotics, research, and just about anything else that comes to mind.
20141021
20141018
A First Experiment with Pylearn2
Vincent Dumoulin recently wrote a great blog post titled Your models in Pylearn2 that shows how to quickly implement a new model idea in Pylearn2. However Pylearn2 has a fair number of models already implemented. This post is meant to compliment his post by explaining how to setup and run a basic experiment using existing components in Pylearn2.
In this tutorial we will train a very simple single layer softmax regression model on MNIST,
a database of handwritten digits. Softmax is a generalization of a
binary predictor called logistic regression to the prediction of one of
many classes. The task will be to identify which digit was written, i.e.
classify the image into the classes 0-9.
YAML Syntax
A
main goal of pylearn2 is to make managing experiments quick and easy.
To that end a basic experiment can be executed by writing a description
of the experiment in YAML (Yet Another Markup Language) and running the
train script (pylearn2/scripts/train.py)
on it.
YAML is a markup language intended to be very sparse as compared to other markup languages such as XML. A run down of useful features for use with Pylearn2 can be found in the document YAML for Pylearn2 and the full specification can be found on yaml.org in case you need to something particularly out of the ordinary like defining a tuple.
A Pylearn2 YAML configuration file identifies the object that will actually perform the training and the parameters it takes. I believe there is only one type of training object at the moment so it's kind of redundant but it allows for easy incorporation of special training procedures. The existing training object takes a specification of the model to be trained, the dataset on which the model should be trained, and the object representing the algorithm that will actually perform the training.
Basic YAML syntax is extremely straight forward and the only special syntax that is really needed for the simplest of experiments is the !obj: tag. This is a Pylearn2 custom tag that instructs the Pylearn2 to instantiate a python object as specified immediately following the tag. For example the statement:
YAML is a markup language intended to be very sparse as compared to other markup languages such as XML. A run down of useful features for use with Pylearn2 can be found in the document YAML for Pylearn2 and the full specification can be found on yaml.org in case you need to something particularly out of the ordinary like defining a tuple.
A Pylearn2 YAML configuration file identifies the object that will actually perform the training and the parameters it takes. I believe there is only one type of training object at the moment so it's kind of redundant but it allows for easy incorporation of special training procedures. The existing training object takes a specification of the model to be trained, the dataset on which the model should be trained, and the object representing the algorithm that will actually perform the training.
Basic YAML syntax is extremely straight forward and the only special syntax that is really needed for the simplest of experiments is the !obj: tag. This is a Pylearn2 custom tag that instructs the Pylearn2 to instantiate a python object as specified immediately following the tag. For example the statement:
!obj:pylearn2.datasets.mnist.MNIST { which: 'train' }
results in the instantiation of the MNIST dataset class found amongst the various Pylearn2 datasets in pylearn2.datasets in the file mnist.py specifying a supplies the value 'train' for a parameter called which that identifies the portion (e.g. training, validation, or test) of the dataset that should be loaded via a python dictionary.
Note that the quotes around the value 'train' are required as they indicate that the value is string which is the required data type for the 'which' parameter.
Note that the quotes around the value 'train' are required as they indicate that the value is string which is the required data type for the 'which' parameter.
It's important to note that any parameters required for the instantiation of a class must be provided in the associated dictionary. Check the Pylearn2 documentation for the class you need to instantiation to understand the available parameters and specifically which are required for the task you are attempting to perform.
Defining an Experiment
To define an experiment we need to define a train object and provide it a dataset object, a model object, and an algorithm object via its parameters dictionary.
We have already seen how to instantiate the MNIST dataset class so lets look next at the algorithm. The Pylearn2 algorithm classes are found in the training_algorithms sub-directory. In this example we are going to use stochastic gradient descent (SGD) because it is arguably the most commonly used algorithm for training neural networks. It requires only one parameter, namely learning_rate, and is instantiated as follows:
Using what we know, we can now construct the train object and in effect the full YAML file as follows:
We have already seen how to instantiate the MNIST dataset class so lets look next at the algorithm. The Pylearn2 algorithm classes are found in the training_algorithms sub-directory. In this example we are going to use stochastic gradient descent (SGD) because it is arguably the most commonly used algorithm for training neural networks. It requires only one parameter, namely learning_rate, and is instantiated as follows:
!obj:pylearn2.training_algorithms.sgd.SGD { learning_rate: 0.05 }The final thing we need to do before we can put it all together is to define a model. The Pylearn2 model classes are located in the model sub-directory. The class we want is called SoftmaxRegression and found in softmax_regression. In its most basic form we only need to supply four parameters:
- nvis: the number of visible units in the network, i.e. the dimensionality of the input.
- n_classes: the number of output units in the network, i.e. the number of classes to be learned.
- irange: the range from which the initial weights should be randomly selected. This is a symmetric range about zero and as such it is only necessary to supply the upper bound.
- batch_size: the number of samples to be used simultaneously during training. Setting this to 1 results in pure stochastic gradient descent whereas setting it to the size of the training set effectively results in batch gradient descent. Any value in between yields stochastic gradient descent with mini-batches of the size specified.
Using what we know, we can now construct the train object and in effect the full YAML file as follows:
!obj:pylearn2.train.Train { dataset: !obj:pylearn2.datasets.mnist.MNIST { which_set: 'train' }, model: !obj:pylearn2.models.softmax_regression.SoftmaxRegression { batch_size: 20, n_classes: 10, nvis: 784, irange: 0.01 }, algorithm: !obj:pylearn2.training_algorithms.sgd.SGD { learning_rate: 0.05 } }
Note that a Pylearn2 YAML file can contain definitions for multiple experiments simultaneously. Simply stack them one after the other and they will be executed in order from top to bottom in the file.
Executing an Experiment
The final step is to run the experiment. Assuming the scripts sub-directory is in your path we simply call train.py and supply the YAML file created above. Assuming that file is called basic_example.yaml and your current working directory contains it the command would be:train.py basic_example.yamlPylearn2 will load the YAML, instantiate the specified objects and run the training algorithm on the model using the specified dataset. An example of the output from this YAML looks like:
dustin@Cortex ~/pylearn2_tutorials $ train.py basic_example.yaml compiling begin_record_entry... compiling begin_record_entry done. Time elapsed: 0.013530 seconds Monitored channels: Compiling accum... Compiling accum done. Time elapsed: 0.000070 seconds Monitoring step: Epochs seen: 0 Batches seen: 0 Examples seen: 0 Time this epoch: 0:02:18.271934 Monitoring step: Epochs seen: 1 Batches seen: 1875 Examples seen: 60000 Time this epoch: 0:02:18.341147
...Note we have not told the training algorithm under what criteria it should stop so it will run forever!
Under the hood the Pylearn2 uses Theano to construct and train many of the models it supports. The first four lines of output, i.e. those related to begin_record_entry and accum, are related to this fact and can be disregarded for our purposes.
The rest of the output is related to Pylearn2's monitoring functionality. Since no channels, particular metrics or statics about the training, have been specified the rest of the output is rather sparse. There are no channels listed under the Monitor channels heading and the only things listed under the Monitoring step headings are those things common to all experiments (e.g. epochs seen, batches seen, and examples seen). The only other output is a summary of the time it took to train each epoch.
Conclusion
Pylearn2 to makes specifying and training models easy and fast. This tutorial looked at the most basic of models. However it does not discuss the myriad training and monitoring options provided by Pylearn2. Nor does it show how to build more complicated models like those with multiple layers as in multilayer perceptrons nor those with special connectivity patterns as in convolutional neural networks. My inclination is to continue in the next post by discussing the types of stopping criteria and how to use them. From there I would proceed to discussing the various training options and work my way towards more complicated models. However I'm amenable to the idea of changing this order if there is something of particular interest so let me know what you would like to see next.20141013
Harvard Librarians Advise Open Access Publishing
Excellent. The Harvard university librarians have written a letter the Harvard faculty and staff encouraging they start publishing in journals that make content free to the public, known as open access journals, as opposed to hidden behind a pay wall. I have been watching this debate for some time as a number of the UofU CS professors have been arguing for exactly this change.
I quite like the policy at the Machine Learning Lab here in Montreal which requires us to publish our articles on Arxiv.org, a database for freely publishing and accessing of scholarly works. It’s not without it’s challenges. For instance you never know the quality of a given paper that you find on Arxiv until you have invested time in reading it. Many arguing for the open access model have been actively trying to devise strategies for such problems. Regardless I believe it’s preferable to not having access to a paper that should probably be cited.
From a grad student's perspective it is nice because I don’t have to spend time submitting special requests for access to articles and then waiting to receive them. It could end up meaning that I have to pay to have my articles published but I personally prefer this because I want my work available to others to hopefully build upon.
I quite like the policy at the Machine Learning Lab here in Montreal which requires us to publish our articles on Arxiv.org, a database for freely publishing and accessing of scholarly works. It’s not without it’s challenges. For instance you never know the quality of a given paper that you find on Arxiv until you have invested time in reading it. Many arguing for the open access model have been actively trying to devise strategies for such problems. Regardless I believe it’s preferable to not having access to a paper that should probably be cited.
From a grad student's perspective it is nice because I don’t have to spend time submitting special requests for access to articles and then waiting to receive them. It could end up meaning that I have to pay to have my articles published but I personally prefer this because I want my work available to others to hopefully build upon.
Subscribe to:
Posts (Atom)