itrain

Ready-to-run scripts for Transformers and Adapters on >50 NLP tasks.

This repository contains itrain, a small library that provides a simple interface for configuring training runs of Transformers and Adapters across a wide range of NLP tasks.

The code is based on the research code of the paper "What to Pre-Train on? Efficient Intermediate Task Selection", which can be found here.

Feature Overview

The itrain package provides:

easy downloading and preprocessing datasets via HuggingFace datasets
integration of a wide range of standard NLP tasks (list)
training run setup & configuration via Python or command-line
automatic checkpointing, WandB logging, resuming & random restarts for score distributions
automatic notification on training start and results via mail or Telegram chat

Setup & Requirements

Before getting started with this repository, make sure to have a recent Python version (> 3.6) and PyTorch (see here) set up (ideally in a virtual environment such as conda).

All additional requirements together with the itrain package can be installed by cloning this repository and then installing from source:

git clone https://github.com/calpt/itrain.git
cd itrain
pip install -e .

Alternatively, you can directly install via pip:

pip install git+https://github.com/calpt/itrain.git

How To Use

Command-line

itrain can be invoked from the command line by passing a run configuration file in YAML or JSON format. Example configurations for all currently supported tasks can be found in the run_configs folder. All supported configuration keys are defined in arguments.py.

Running a setup from the command line can look like this:

itrain run --id 42 run_configs/sst2.yaml

This will train an adapter on the SST-2 task using robert-base as the base model (as specified in the config file).

Besides modifying configuration keys directly in the YAML file, they can be overriden using command line parameters. E.g., we can modify the previous training run to fully fine-tune a bert-base-uncased model:

itrain run --id 42 \
    --model_name_or_path bert-base-uncased \
    --train_adapter false \
    --learning_rate 3e-5 \
    --num_train_epochs 3 \
    --patience 0 \
    run_configs/sst2.yaml

Python script

Alternatively, training setups can be configured directly in Python by using the Setup class of itrain. An example for this is given in example.py.

Credits

huggingface/transformers for the Transformers implementations, the trainer class and the training scripts on which this repository is based
huggingface/datasets for dataset downloading and preprocessing
Adapter-Hub/adapter-transformers for the adapter implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

itrain

Feature Overview

Setup & Requirements

How To Use

Command-line

Python script

Credits

Files

README.md

Latest commit

History

README.md

File metadata and controls

itrain

Feature Overview

Setup & Requirements

How To Use

Command-line

Python script

Credits