Skip to content

Latest commit

 

History

History

LoNAS

Official implementation of LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models.

This repo contains the code for LoNAS, which is a pioneering method that leverages Neural Architecture Search (NAS) to explore a space of elastic low-rank adapters, effectively compressing large language models while maintaining or even enhancing performance, thus facilitating their use in resource-constrained environments. Please refer to our paper for more details.

Setup

Here is an installation script developed from scratch for LoNAS.

pip install virtualenv
virtualenv lonas-env
source lonas-env/bin/activate

# install pytorch
pip install torch==2.1.2

# install dependencies
bash install.sh
# Note: please ignore the whitespace issues when applying the patch and running `install.sh`.

Quick Start

Training

Taking the unified commonsense reasoning training as an example, please download the 15K instruction-following commonsense reasoning training data from LLM-Adapters.

Example command to train a super-adapter of LLaMA-7B using LoNAS:

python run_commonsense.py \
    --dataset_path commonsense_15k.json \
    --model_name_or_path yahma/llama-7b-hf \
    --do_train \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --num_train_epochs 6 \
    --warmup_steps 100 \
    --optim adamw_torch \
    --fp16 \
    --output_dir <path to super-adapter> \
    --logging_steps 20 \
    --save_strategy epoch \
    --save_total_limit 2 \
    --lora \
    --lora_r 32 \
    --lora_alpha 64 \
    --lora_dropout 0.1 \
    --target_modules q_proj,k_proj,v_proj,up_proj,gate_proj,down_proj \
    --nncf_config nncf_config/unified_commonsense/nncf_lonas_llama_7b.json

The nncf_config indicates the NNCF configuration encompassing the search space for elastic adapters and modules of the base model (e.g., q_proj). The implementation of the elastic modules leverages the BootstrapNAS feature of OpenVINO™ NNCF. We employ the stage LR scheduler within NNCF, so the learning rate schedule is specified within the NNCF configuration file, rather than within the arguments of TrainingArguments. For instance,

"schedule": {
    "list_stage_descriptions": [
        {"train_dims": ["width"], "epochs": 6, "depth_indicator": 1, "width_indicator": 5, "init_lr": 3e-4, "epochs_lr": 6, "sample_rate": 1}
    ]
},

For more details on the stage scheduler, see BootstrapNAS.md. After training, the weights of the trained super-adapter will be obtained in the output_dir directory.

Evaluation

All evaluation datasets can be downloaded from LLM-Adapters. Place them into the directory datasets.

git clone https://github.com/AGI-Edgerunners/LLM-Adapters.git
mv LLM-Adapters/dataset datasets 

Example command to evaluate the trained super-adapter (heuristic subnetwork):

python run_commonsense.py \
    --dataset_path None \
    --model_name_or_path yahma/llama-7b-hf \
    --lora \
    --lora_weights <path to super-adapter> \
    --nncf_config nncf_config/unified_commonsense/nncf_lonas_llama_7b.json \
    --do_test \
    --output_dir <path to results>

This command evaluates the performance of the heuristic subnetwork across eight commonsense reasoning tasks: BoolQ, PIQA, SIQA, HellaSwag, WinoG, Arc-e, Arc-c, and OBQA.

Search

In order to discover more optimized subnetworks within the trained super-network, LoNAS employs advanced search algorithms to further explore the super-network. To implement it, we leverage OpenVINO™ NNCF, which conveniently supports various search algorithms, requiring the configuration of search settings within NNCF config, such as:

"search": {
    "algorithm": "NSGA2",
    "batchnorm_adaptation": {
        "num_bn_adaptation_samples": 0
    },
    "num_evals": 200,
    "population": 5,
    "ref_acc": 0.45,
    "acc_delta": 0.01
}

Further details can be found in BootstrapNAS.md. The following is an example command to search for the trained super-adapter:

python run_commonsense.py \
    --dataset_path commonsense_15k.json \
    --model_name_or_path yahma/llama-7b-hf \
    --lora \
    --lora_weights <path to super-adapter> \
    --val_set_size 1000
    --nncf_config nncf_config/unified_commonsense/nncf_lonas_llama_7b.json \
    --do_search \
    --output_dir <path to search results>

The argument --val_set_size 1000 signifies the utilization of 1k validation samples to evaluate each discovered subnetwork. After running this command, the results of the 200 identified subnetworks ("num_evals": 200 set in the search field of NNCF config) will be placed in the --output_dir folder, including search_progression.png and search_progression.csv. From these results, we can select the subnetwork configurations that best meet different requirements.

Released Models

Name Tasks Base Model
lonas-bert-base-glue RTE, MRPC, STS-B, CoLA, SST2, QNLI, QQP, MNLI bert-base-uncased
lonas-llama-7b-commonsense Commonsense Reasoning yahma/llama-7b-hf
lonas-bloomz-7b-math Math Reasoning bigscience/bloomz-7b1

Reproduce Results

Please refer to running_commands for all commands related to reproducing the paper's results.

  • GLUE benchmark
Method Trainable Parameter Ratio GFLOPs RTE MRPC STS-B CoLA SST-2 QNLI QQP MNLI AVG
LoRA 0.27% 11.2 65.85 84.46 88.73 57.58 92.06 90.62 89.41 83.00 81.46
LoNAS 0.27% 8.0 70.76 88.97 88.28 61.12 93.23 91.21 88.55 82.00 83.02
  • Commonsense Reasoning
Method Total Params. TFLOPs BoolQ PIQA SIQA HellaSwag WinoG Arc-e Arc-c OBQA Average
LoRA 6.7B 1.7 62.6 75.3 67.9 52.9 58.6 79.2 58.3 71.2 65.8
LoNAS 5.6B 1.4 62.9 73.0 68.7 51.4 63.9 72.3 58.5 71.0 65.2
  • Math Reasoning
Method Total Params. TFLOPs GSM8K AQuA MAWPS SVAMP Average
LoRA 7.1B 1.8 17.4 21.3 70.2 41.0 37.5
LoNAS 6.1B 1.5 18.6 22.0 76.5 31.8 37.2

Citation

@inproceedings{munoz-etal-2024-lonas,
    title = "{L}o{NAS}: Elastic Low-Rank Adapters for Efficient Large Language Models",
    author = "Munoz, Juan Pablo  and
      Yuan, Jinjie  and
      Zheng, Yi  and
      Jain, Nilesh",
    editor = "Calzolari, Nicoletta  and
      Kan, Min-Yen  and
      Hoste, Veronique  and
      Lenci, Alessandro  and
      Sakti, Sakriani  and
      Xue, Nianwen",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.940",
    pages = "10760--10776",
}

Acknowledgement

This work benefits from the following repositories: