This is a repository for my semester project (Research in Computer Science: 263-0600-00L) under the Data Analytics group at ETH Zürich in the fall semester of 2021. For a detailed explanation of the idea, the approach and the findings, please check the project report.
- Antonio Orvieto
- Jonas Kohler
Poetry is used for conveniently installing and managing dependencies. pre-commit is used for managing hooks that run before each commit, to ensure code quality and run some basic tests.
-
[Optional] Create and activate a virtual environment with Python >= 3.8.5.
-
Install Poetry globally (recommended), or in a virtual environment. Please refer to Poetry's installation guide for recommended installation options.
You can use pip to install it:
pip install poetry
-
Install all dependencies, including extra dependencies for development, with Poetry:
poetry install
To avoid installing development dependencies, run:
poetry install --no-dev
If you didn't create and activate a virtual environment in step 1, Poetry creates one for you and installs all dependencies there. To use this virtual environment, run:
poetry shell
-
Install pre-commit hooks:
pre-commit install
NOTE: You need to be inside the virtual environment where you installed the above dependencies every time you commit. However, this is not required if you have installed pre-commit globally.
The optimizers are tested on multiple tasks. Each task involves training a certain model in a certain manner (supervised, unsupervised, etc.) on a certain dataset. Every task is given a task ID, which is used when running scripts.
The list of tasks implemented, along with their IDs, are:
Task | Task ID | Description |
---|---|---|
CIFAR-10 | cifar |
A ResNet-18 on the CIFAR10 dataset. |
All scripts use argparse to parse commandline arguments.
Each Python script takes the task ID as a positional argument.
To view the list of all positional and optional arguments for a script script.py
, run:
./script.py --help
Hyper-parameters can be specified through YAML configs. For example, to specify a batch size of 32 and a learning rate of 0.001, use the following config:
lr: 0.001
batch_size: 32
You can store configs in a directory named configs
located in the root of this repository.
It has an entry in the .gitignore
file so that custom configs aren't picked up by git.
The available hyper-parameters, their documentation and default values are specified in the Config
class in the file src/config.py
.
NOTE: You do not need to mention every single hyper-parameter in a config. In such a case, the missing ones will use their default values.
Support for tuning hyper-parameters for the optimizers is available in the training script.
It has the -m
or the --mode
flag to set the mode of operation.
This has the following values:
train
: This simply trains a model. This is the default mode.tune
: This tunes the hyper-parameters using Hyperopt.
Thus, to tune hyper-parameters for models on a certain task, run the training script as follows:
./train.py --mode tune
Logs are stored with certain directory structures. For training, this is:
project root
|_ root log directory
|_ experiment name
|_ timestamped run directory
For tuning, this is:
project root
|_ root log directory
|_ experiment name
|_ timestamped tuning run directory
|_ training run 0 directory
|_ training run 1 directory
...
The timestamp uses the ISO 8601 convention along with the local timezone.
The root log directory can be specified with the --log-dir
argument.
By default, this is logs
.
The sub-directory for each training run will contain:
- The latest checkpoint of the trained model, within the
checkpoints
sub-directory - Training logs, as a file with the prefix
events.out.tfevents.
- The hyper-parameter config (including defaults), as a YAML file named
hparams.yaml
The sub-directory for a tuning run will contain:
- Sub-directories for each training run
- The best hyper-parameter config (including defaults), as a YAML file named
best-hparams.yaml
For choosing which GPUs to train on, use the -g
or the --num-gpus
flag when running a script as follows:
./script.py --num-gpus 3
This selects three available GPUs for training. By default, only one GPU is chosen.
This implementation supports mixed-precision training, which is enabled by default.
To manually set the floating-point precision, use the -p
or the --precision
flag when running a script as follows:
./script.py --precision 32
Note that mixed-precision training will only provide significant speed-ups if your GPUs have special support for mixed-precision compute.