During training you can monitor your experiments with Tensorboard.
We also try to provide some useful functionality to quickly evaluate and compare the results of your experiments.
One can use the evaluate_experiment.py
to get a quick first impression of a finished experiment run.
You can use the plotting pipeline with your customized setting (as shown in the usage examples). Alternatively you can use the script to export your data to a .csv and process the data to your own needs.
In this scenario, set evaluation.output_types: [csv] # no plotting, just the data
in your experiment yaml.
In the following you can find 4 example use cases for experiments and how to visualize the results as heatmaps.
- testing an optimizer on a task
- comparing two optimizers on the same task
- comparing multiple optimizers on different tasks
- comparing the influence of a single hyperparameter
Here we want to focus on the plotting. For instructions on how to run experiments, refer to the main README. To get started right away, we provide the data for this example. If you want to reproduce it, refer to this section.
By default, calling the run_experiment.py
will plot the experiment after training and testing. To disable, set engine.plot=false
.
To plot your experiment afterwards, call the evaluate_experiment.py
with the same experiment yaml. To adjust how to plot, change the values under the evaluation
key of the experiment. Take a look at the evaluation/default.yaml to see which settings are available. Some of these keys are explained in the examples below to give the reader a first impression. Note that some default parameters are set in the respective tasks (e.g. in tasks/mnist/default.yaml).
Here are some example scenarios to give you an understanding of how our plotting works. Run the commands from the root of the repository. Take a look at the yaml files used in the command to see what is going on.
This example is a good starting point; it shows the performance of a single default optimizer on one of the tasks. Experiment file: examples/plotting/1_mnist-adamw.yaml
python -m pytorch_fob.evaluate_experiment examples/plotting/1_mnist-adamw.yaml
This example uses only the final model performance and only creates the plot as png.
Helpful settings:
checkpoints: [last]
# you could use [last, best] to additionaly plot the model with the best validationoutput_types: [png]
# you could use [pdf, png] to also create a pdf
You can compare two different optimizers.
Experiment file: examples/plotting/2_adamw-vs-sgd.yaml
python -m pytorch_fob.evaluate_experiment examples/plotting/2_adamw-vs-sgd.yaml
Helpful settings:
plot.x_axis: [optimizer.weight_decay, optimizer.kappa_init_param]
# the values given here are used as the value for the axis. The order in the list is used from left to right for the plot columnscolumn_split_key: optimizer.name
This creates a column for each different optimizer (default behavior). You can set this to null to disable columns or choose a different key.
There are multiple tasks in the benchmark, this example shows how to get a quick overview over multiple at the same time.
Experiment file: examples/plotting/3_mnist-and-tabular_adamw-vs-sgd.yaml
python -m pytorch_fob.evaluate_experiment examples/plotting/3_mnist-and-tabular_adamw-vs-sgd.yaml
Helpful settings:
split_groups: ["task.name"]
Every non unique value for each parameter name in split_groups
will create its own subplot.
Instead of a list you can set to false
to disable splitting or true
to split on every parameter that is different between runs (except those already in column_split_key
or aggregate_groups
).
This list is useful if there are just a few parameters you want to split.
Any parameter that is neither on the x-axis nor y-axis will either be aggregated over or split into subplots.
Any individual square of a heatmap shows the mean and std over multiple runs (as seen in the previous plots). Here we show how to choose the runs to aggregate.
Experiment file: examples/plotting/4_adamw-vs-sgd_seeds.yaml
python -m pytorch_fob.evaluate_experiment examples/plotting/4_adamw-vs-sgd_seeds.yaml
Helpful settings:
- Control the std with
plot.std
# toggle off withFalse
plot.aggfunc: std
# also tryvar
- control the rows with
split_groups: ["engine.seed"]
aggregate_groups: []
Per default the plot will display the mean and std calculated over the seeds.
We need to remove the seed from the aggregate_groups
list (by giving an empty list instead). This list is useful if there are additional parameters you want to aggregate over.
Lets create some data that we can plot; from the root directory call:
first we make sure the data is already downloaded beforehand:
python -m pytorch_fob.dataset_setup examples/plotting/3_mnist-and-tabular_adamw-vs-sgd.yaml
This will download the mnist data (required for 1-4) and tabular (required for 3) into the examples/data directory - path can be changed in the corresponding yaml you want to use (e.g. examples/plotting/1_mnist-adamw.yaml if you have already set up your benchmark).
Estimated disk usage for the data: ~65M
The 2 tasks will be run on 2x2 hyperparameter on 2 different seeds per optimizer for a total of 32 runs.
python -m pytorch_fob.run_experiment examples/plotting/3_mnist-and-tabular_adamw-vs-sgd.yaml
After training finished you should find 32 run directories in examples/plotting/outputs
All parameters that differ from the default value are noted in the directory name.