20141018

A First Experiment with Pylearn2

Vincent Dumoulin recently wrote a great blog post titled Your models in Pylearn2 that shows how to quickly implement a new model idea in Pylearn2. However Pylearn2 has a fair number of models already implemented. This post is meant to compliment his post by explaining how to setup and run a basic experiment using existing components in Pylearn2.

In this tutorial we will train a very simple single layer softmax regression model on MNIST, a database of handwritten digits. Softmax is a generalization of a binary predictor called logistic regression to the prediction of one of many classes. The task will be to identify which digit was written, i.e. classify the image into the classes 0-9.

This same task is addressed in the Softmax regression Pylearn2 tutorial. This post will borrow from that tutorial. However Pylearn2 is feature rich allowing one to control everything from which model to train and which dataset to train it on to fine grained control over the training and the ability to monitor and save statistics about an experiment. For the sake of simplicity and understanding we will not be using most of them and as such this tutorial will be simpler.

YAML Syntax

A main goal of pylearn2 is to make managing experiments quick and easy. To that end a basic experiment can be executed by writing a description of the experiment in YAML (Yet Another Markup Language) and running the train script (pylearn2/scripts/train.py) on it.

YAML is a markup language intended to be very sparse as compared to other markup languages such as XML. A run down of useful features for use with Pylearn2 can be found in the document YAML for Pylearn2 and the full specification can be found on yaml.org in case you need to something particularly out of the ordinary like defining a tuple.

A Pylearn2 YAML configuration file identifies the object that will actually perform the training and the parameters it takes. I believe there is only one type of training object at the moment so it's kind of redundant but it allows for easy incorporation of special training procedures. The existing training object takes a specification of the model to be trained, the dataset on which the model should be trained, and the object representing the algorithm that will actually perform the training.

Basic YAML syntax is extremely straight forward and the only special syntax that is really needed for the simplest of experiments is the !obj: tag. This is a Pylearn2 custom tag that instructs the Pylearn2 to instantiate a python object as specified immediately following the tag. For example the statement:
!obj:pylearn2.datasets.mnist.MNIST { which: 'train' }
results in the instantiation of the MNIST dataset class found amongst the various Pylearn2 datasets in pylearn2.datasets in the file mnist.py specifying a supplies the value 'train' for a parameter called which that identifies the portion (e.g. training, validation, or test) of the dataset that should be loaded via a python dictionary.

Note that the quotes around the value 'train' are required as they indicate that the value is string which is the required data type for the 'which' parameter.

It's important to note that any parameters required for the instantiation of a class must be provided in the associated dictionary. Check the Pylearn2 documentation for the class you need to instantiation to understand the available parameters and specifically which are required for the task you are attempting to perform.

Defining an Experiment

To define an experiment we need to define a train object and provide it a dataset object, a model object, and an algorithm object via its parameters dictionary.

We have already seen how to instantiate the MNIST dataset class so lets look next at the algorithm. The Pylearn2 algorithm classes are found in the training_algorithms sub-directory. In this example we are going to use stochastic gradient descent (SGD) because it is arguably the most commonly used algorithm for training neural networks. It requires only one parameter, namely learning_rate, and is instantiated as follows:
!obj:pylearn2.training_algorithms.sgd.SGD { learning_rate: 0.05 }
The final thing we need to do before we can put it all together is to define a model. The Pylearn2 model classes are located in the model sub-directory. The class we want is called SoftmaxRegression and found in softmax_regression. In its most basic form we only need to supply four parameters:
  • nvis: the number of visible units in the network, i.e. the dimensionality of the input.
  • n_classes: the number of output units in the network, i.e. the number of classes to be learned.
  • irange: the range from which the initial weights should be randomly selected. This is a symmetric range about zero and as such it is only necessary to supply the upper bound.
  • batch_size: the number of samples to be used simultaneously during training. Setting this to 1 results in pure stochastic gradient descent whereas setting it to the size of the training set effectively results in batch gradient descent. Any value in between yields stochastic gradient descent with mini-batches of the size specified.

Using what we know, we can now construct the train object and in effect the full YAML file as follows:
!obj:pylearn2.train.Train {
    dataset: !obj:pylearn2.datasets.mnist.MNIST { which_set: 'train' },
    model: !obj:pylearn2.models.softmax_regression.SoftmaxRegression {
        batch_size: 20,
        n_classes: 10,
        nvis: 784,
        irange: 0.01
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD { learning_rate: 0.05 }
}
Note that a Pylearn2 YAML file can contain definitions for multiple experiments simultaneously. Simply stack them one after the other and they will be executed in order from top to bottom in the file.

Executing an Experiment

The final step is to run the experiment. Assuming the scripts sub-directory is in your path we simply call train.py and supply the YAML file created above. Assuming that file is called basic_example.yaml and your current working directory contains it the command would be:
train.py basic_example.yaml
Pylearn2 will load the YAML, instantiate the specified objects and run the training algorithm on the model using the specified dataset. An example of the output from this YAML looks like:
dustin@Cortex ~/pylearn2_tutorials $ train.py basic_example.yaml
compiling begin_record_entry...
compiling begin_record_entry done. Time elapsed: 0.013530 seconds
Monitored channels: 
Compiling accum...
Compiling accum done. Time elapsed: 0.000070 seconds
Monitoring step:
 Epochs seen: 0
 Batches seen: 0
 Examples seen: 0
Time this epoch: 0:02:18.271934
Monitoring step:
 Epochs seen: 1
 Batches seen: 1875
 Examples seen: 60000
Time this epoch: 0:02:18.341147
... 
Note we have not told the training algorithm under what criteria it should stop so it will run forever!

Under the hood the Pylearn2 uses Theano to construct and train many of the models it supports. The first four lines of output, i.e. those related to begin_record_entry and accum, are related to this fact and can be disregarded for our purposes.

The rest of the output is related to Pylearn2's monitoring functionality. Since no channels, particular metrics or statics about the training, have been specified the rest of the output is rather sparse. There are no channels listed under the Monitor channels heading and the only things listed under the Monitoring step headings are those things common to all experiments (e.g. epochs seen, batches seen, and examples seen). The only other output is a summary of the time it took to train each epoch.

Conclusion

Pylearn2 to makes specifying and training models easy and fast. This tutorial looked at the most basic of models. However it does not discuss the myriad training and monitoring options provided by Pylearn2. Nor does it show how to build more complicated models like those with multiple layers as in multilayer perceptrons nor those with special connectivity patterns as in convolutional neural networks. My inclination is to continue in the next post by discussing the types of stopping criteria and how to use them. From there I would proceed to discussing the various training options and work my way towards more complicated models. However I'm amenable to the idea of changing this order if there is something of particular interest so let me know what you would like to see next.

2 comments:

  1. I thought of a question. It seems like the most basic ingredients are the _dataset_, the _algorithm_, and the _model_, is that correct? How does the algorithm know what to optimize within the model?

    Also, is it possible to stack models, for example put a convolutional layer and a perceptron layer in the same model?

    ReplyDelete
  2. Yes, your right. The most basic ingredients are the dataset, algorithm, and model. I sort of say as much in the third paragraph of the YAML syntax section but I realize now it's stated outright so I will update.

    Regarding how the algorithm knows what parameters to optimize, the short answer is that the model exposes a list of the appropriate parameters via a member variable called _params. The inner workings of this part is really interesting but requires going into details about how Pylearn2 uses Theano, at least for the most part, so I will avoid it for now.

    As for whether models can be stacked, of course! Well sort of. :) I don't yet now to explicitly send the output from one model to the input of another. For instance I don't know how to send the output of an MLP to an Autoencoder. However in pylearn2.models.mlp there is a class called MLP that takes a list of layers and a specification of how to wire them together. That same module contains a number of different types of layers allowing for the construction of fairly complicated models. I still need to learn a bit about the inner workings with respect to convolutions but I believe these are actually treated as something called spaces which are descriptions of how the data should be transformed from layer to layer.

    ReplyDelete