LQER: Low-Rank Quantization Error Reconstruction for LLMs

[ Paper ] [ Code ]

"Big-little" Llama

LQER runs a high-rank low-precision GEMM and a group of low-rank high-precision GEMMs in parallel to push the limitation of lossless LLM PTQ.
DeepWok Lab

The DeepWok Lab, is an ML research group led by Dr. Aaron Zhao, where the group members are mainly from Imperial College London and the University of Cambridge.

News

🎉🎉🎉 Our work is accepted in ICML2024.

Introduction

LQER is a post-training-quantization method that

shapes the singular value distribution of approximated quantization error;
enjoys a static compute pattern and unified memory/compute number formats;
eliminates the needs of grid search, knowledge distillation, or other forms of iterative optimization.
achieves near-lossless W4A8 LLM PTQ while using 1.36 $\times$ hardware resources compared to SoTA methods.

Installation

Anaconda is recommended. Run the following commands to create a conda environment named lqer.

conda env create -f environment.yml
conda run -n lqer python -m pip install -r requirements.txt

Note that this lqer env is for running LQER experiments. The baseline methods such as AWQ, GPTQ, and LLM.int4() included in the paper requires another env setup. Please follow the HuggingFace Transformer quantization guide to replicate baseline results.

Experiments

Entry point

Entry point is at experiments/pipeline/pipeline.py. This pipeline.py performs data calibration, approximation, software-emulated quantization, perplexity evaluation, and downstream task evaluation for $\text{LQER}$/$\text{L}^2\text{QER}$/no-$\text{LQER}$ quantization.

The pipeline.py requires one argument CONFIG, which should be a toml file that specifies the experiment settings:

cd experiments/pipeline
conda run -n lqer python pipeline.py CONFIG

Please refer to the toml templates in ./experiments/configs/template for the configuration file format.

Scripts

Scripts for replicating paper results:

Script	Note
experiments/pipeline/sweep_lqer_act.sh	W4A8 $\text{L}^2\text{QER}-\texttt{MXINT}$
experiments/pipeline/sweep_lqer_act_int.sh	W4A8 $\text{L}^2\text{QER}-\texttt{INT}$
experiments/pipeline/sweep_lqer_svd.sh	Baseline W4A8 $\text{LQER}-\texttt{MXINT}$
experiments/pipeline/sweep_baseline_no_lqer.sh	Baseline W4A8 MXINT w/o $\text{LQER}$

Under the hood, these scripts call pipeline.py and overwrite the key-value pairs in the passed config template to generate the corresponding experiment setup.

Script Usage

Each script requires one argument CONFIG and one argument TAG. The CONFIG should be a config template toml file. The TAG is a string that will be used to name the output directory.

cd experiments/pipeline
./sweep_xxx.sh CONFIG TAG

For example, to replicate the W4A8 $\text{L}^2\text{QER}-\texttt{MXINT}$ on LLaMA-7B:

cd experiments/pipeline
./sweep_lqer_act.sh ../configs/template/llama-7b.toml my-llama-7b-tag

Citation

If you find this work helpful, please consider citing:

@article{zhang2024lqer,
  title={LQER: Low-Rank Quantization Error Reconstruction for LLMs},
  author={Zhang, Cheng and Cheng, Jianyi and Constantinides, George A and Zhao, Yiren},
  journal={arXiv preprint arXiv:2402.02446},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
experiments		experiments
figures		figures
src/lqer		src/lqer
submodules		submodules
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LQER: Low-Rank Quantization Error Reconstruction for LLMs

News

Introduction

Installation

Experiments

Entry point

Scripts

Script Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ChengZhang-98/LQER

Folders and files

Latest commit

History

Repository files navigation

LQER: Low-Rank Quantization Error Reconstruction for LLMs

News

Introduction

Installation

Experiments

Entry point

Scripts

Script Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages