In this tutorial we will look at two forms of monitoring. The basic form which is always done and a new approach for real-time remote monitoring.
Basic Monitoring
We will build upon the bare-bones example from the previous tutorial which means we will be using the MNIST dataset. Most datasets have two or three parts. At a minimum they have a part for training and a part for testing. If a dataset has a third part its purpose is for validation, or measuring the performance of our learner without unduly biasing our learner towards the dataset.Pylearn2 performs monitoring at the end of each epoch and it can monitor any combination of the parts of the dataset. When using Stochastic Gradient Descent (SGD) as the training algorithm one uses the monitoring_dataset parameter to specify which parts of the dataset are to be monitored. For example, if we are only interested in monitoring the training set we would add the following entry to the SGD parameter dictionary:
monitoring_dataset: { 'train': !obj:pylearn2.datasets.mnist.MNIST { which_set: 'train' } }
This will instruct Pylearn2 to calculate statistics about the performance of our learner using the training part of the dataset at the end of each epoch. This will change the default output after each epoch from:
Monitoring step: Epochs seen: 1 Batches seen: 1875 Examples seen: 60000 Time this epoch: 2.875921 seconds
to:
Monitoring step: Epochs seen: 0 Batches seen: 0 Examples seen: 0 learning_rate: 0.0499996989965 total_seconds_last_epoch: 0.0 train_objective: 2.29713964462 train_y_col_norms_max: 0.164925798774 train_y_col_norms_mean: 0.161361783743 train_y_col_norms_min: 0.158035755157 train_y_max_max_class: 0.118635632098 train_y_mean_max_class: 0.109155222774 train_y_min_max_class: 0.103917405009 train_y_misclass: 0.910533130169 train_y_nll: 2.29713964462 train_y_row_norms_max: 0.0255156457424 train_y_row_norms_mean: 0.018013747409 train_y_row_norms_min: 0.00823106430471 training_seconds_this_epoch: 0.0 Time this epoch: 2.823628 seconds
Each of the entries in the output (e.g. learning_rate, train_objective) are called channels. Channels give one insight into what the learner is doing. The two most frequently used are train_objective and train_y_nll. The channel train_objective reports the cost being optimized by training while train_y_nll monitors the negative log likelihood of the current parameter values. In this particular example these two channels are monitoring the same thing but this will not always be the case.
Monitoring the train part of the dataset is useful for debugging purposes. However it is not enough alone to evaluate the performance of our learner because the learner will likely always improve and at some point it begins to overfit on the training data. In other words it will find parameters that work well on the data used to train it but not on data it has not seen during training. To combat this we use a validation set. MNIST does not explicitly reserve a part of the data for validation but it has become a de facto standard to use the last 10,000 samples from the train part. To specify this one uses the start and stop parameters when instantiating MNIST. If we were only monitoring the validation set our monitoring_dataset parameter to SGD would be:
monitoring_dataset: { 'valid': !obj:pylearn2.datasets.mnist.MNIST { which_set: 'train', start: 50000, stop: 60000 } }
Note that the key to the dictionary, 'valid' in this case, is merely a label. It can be whatever we choose. Each channel monitored for the associated dataset is prepended with this value.
It's also worth noting that we are not limited to monitoring just one part of the dataset. It is usually helpful to monitor both the train and validation parts of a data set. This is done as follows:
monitoring_dataset: { 'train': !obj:pylearn2.datasets.mnist.MNIST { which_set: 'train', start: 0, stop: 50000 }, 'valid': !obj:pylearn2.datasets.mnist.MNIST { which_set: 'train', start: 50000, stop: 60000 } }
Note that here we use the start and stop parameters when loading both the train and valid parts to appropriately partition the dataset. We do not want the learner to validate on the data from the train dataset otherwise we will not be able to identify overfitting.
Putting it all together our our complete YAML now looks like:
!obj:pylearn2.train.Train { dataset: &train !obj:pylearn2.datasets.mnist.MNIST { which_set: 'train' start: 0, stop: 50000 }, model: !obj:pylearn2.models.softmax_regression.SoftmaxRegression { batch_size: 20, n_classes: 10, nvis: 784, irange: 0.01 }, algorithm: !obj:pylearn2.training_algorithms.sgd.SGD { learning_rate: 0.05, monitoring_dataset: { 'train': *train, 'valid': !obj:pylearn2.datasets.mnist.MNIST { which_set: 'train', start: 50000, stop: 60000 } } } }
Note that here we have used a YAML trick to reference a previously instantiated object to save ourselves typing. Specifically the dataset has been tagged "&train" and when specifying monitor_dataset the reference "*train" is used to identify the previously instantiated object.
Live Monitoring
There are two problems with the basic monitoring mechanism in Pylearn2. First the output is raw text. This alone can make it difficult to understand how the values of the various channels are evolving in time. Especially when attempting to track multiple channels simultaneously. Second, due in part to the ability to add channels for monitoring, the amount of output after each epoch can and frequently does grow quickly. Combined these problems make the basic monitoring mechanism difficult to use.An alternative approach is to use a new mechanism called live monitoring. To be completely forthright the live monitoring mechanism is something that I developed to combat the aforementioned problems. Furthermore I am interested in feedback regarding its user interface and what additional functionality people would like. Please feel free to send an E-mail to the Pylearn2 users mailing list or leave a comment below with feedback.
The live monitoring mechanism has two parts. The first part is a training extension, i.e. an optional plug-in that modifies the way training is performed. The second part is a utility class that can query the training extension for data about channels being monitored.
Training extensions can be selected using the extensions parameter to the train object. In other words add the following to the parameters dictionary for the train object in any YAML:
extensions: [ !obj:pylearn2.train_extensions.live_monitoring.LiveMonitoring {} ]
The full YAML would look like:
!obj:pylearn2.train.Train { dataset: &train !obj:pylearn2.datasets.mnist.MNIST { which_set: 'train' start: 0, stop: 50000 }, model: !obj:pylearn2.models.softmax_regression.SoftmaxRegression { batch_size: 20, n_classes: 10, nvis: 784, irange: 0.01 }, algorithm: !obj:pylearn2.training_algorithms.sgd.SGD { learning_rate: 0.05, monitoring_dataset: { 'train': *train, 'valid': !obj:pylearn2.datasets.mnist.MNIST { which_set: 'train', start: 50000, stop: 60000 } } }, extensions: [ !obj:pylearn2.train_extensions.live_monitoring.LiveMonitoring {} ] }
The LiveMonitoring training extension listens for queries about channels being monitored. To perform queries one need only instantiate LiveMonitor and use it's methods to request data. Currently it has three methods:
- list_channels: Returns a list of channels being monitored.
- update_channels: Retrieves data about the list of specified channels.
- follow_channels: Plots the data for the specified channels. This command blocks other commands from being executed because it repeatedly requests the latest data for the specified channels and redraws the plot as new data arrives.
from pylearn2.train_extensions.live_monitoring import LiveMonitor lm = LiveMonitor()
Each of the methods listed above return a different message object. The data of interest is contained in the data member of that object. As such, given an instance of LiveMonitor, one would view the channels being monitored as follows:
print lm.list_channels().data
Which, if we're running the experiment specified by the YAML above, will yield:
['train_objective', 'train_y_col_norms_max', 'train_y_row_norms_min', 'train_y_nll', 'train_y_col_norms_mean', 'train_y_max_max_class', 'train_y_min_max_class', 'train_y_row_norms_max', 'train_y_misclass', 'train_y_col_norms_min', 'train_y_row_norms_mean', 'train_y_mean_max_class', 'valid_objective', 'valid_y_col_norms_max', 'valid_y_row_norms_min', 'valid_y_nll', 'valid_y_col_norms_mean', 'valid_y_max_max_class', 'valid_y_min_max_class', 'valid_y_row_norms_max', 'valid_y_misclass', 'valid_y_col_norms_min', 'valid_y_row_norms_mean', 'valid_y_mean_max_class', 'learning_rate', 'training_seconds_this_epoch', 'total_seconds_last_epoch']
From this we can pick channels to plot using follow_channels:
lm.follow_channels(['train_objective', 'valid_objective'])
This command will then display a graph like that in figure 1 and continually updates the plot at the end of each epoch.
Figure 1: Example output from the follow_channels method of the LiveMonitor utility object.
The live monitoring mechanism is network aware and by default it answers queries on port 5555 of any network interface on the computer wherein the experiment is being executed. It is not necessary for a user to know anything about networking to use live monitoring however. By default the live monitoring mechanism assumes the experiment of interest is being executed on the same computer as the LiveMonitor utility class. If that is not the case and one knows the IP address of the computer on which the experiment is running then one need only specify the address when instantiating LiveMonitor. The live monitoring mechanism will automatically take care of the networking.
Live monitoring is also very efficient. It only ever requests data it does not already have and the underlying networking utility waits for new data without taking unnecessary CPU time.
The live monitoring mechanism has many benefits including:
- The ability to filter the channels being monitored.
- The ability to plot data for any given set of channels being monitored.
- The ability to retrieve data from an experiment in real-time*
- The ability to query for data from an experiment running on a remote machine.
- The ability to change which channels are being followed or plotted without restarting an experiment.