Skip to content

Latest commit

 

History

History
16 lines (13 loc) · 1.99 KB

File metadata and controls

16 lines (13 loc) · 1.99 KB

If you are evaluating LLaMA models with recent versions of Transformers, please remove @use_kernel_forward_from_hub("RMSNorm") in modeling_llama.py and enable add_bos_token(this is set as default in AutoRound) in lm-eval to stabilize the accuracy. These adjustments affect the quantized model but not the BF16 model for the tasks evaluated in the AutoRoundv2 paper.

All other settings follow the default configurations of AutoRound and lm-eval.

Qwen3-8B W2G64 Avg. arc_challenge hellaswag gsm8k lambada_openai mmlu mmlupro truthfulqa_mc1 winogrande
AutoRound 0.4373 0.4019 0.4437 0.4215 0.4826 0.5474 0.2630 0.3072 0.6314
AutoRound+alg_ext 0.4787 0.4275 0.4516 0.5944 0.5181 0.5773 0.2807 0.3305 0.6496
AutoRoundBest+alg_ext lr 2e-3 0.4937 0.4505 0.474 0.5906 0.5556 0.6028 0.3127 0.3109 0.6527
Llama3.1-8B-Instruct W2G64 Avg. arc_challenge hellaswag gsm8k lambada_openai mmlu mmlupro truthfulqa_mc1 winogrande
AutoRound 0.3820 0.3635 0.4562 0.1622 0.5069 0.4411 0.1661 0.3207 0.6393
AutoRound+alg_ext 0.4166 0.3712 0.4729 0.2039 0.5946 0.4981 0.2163 0.3011 0.6748
AutoRoundBest+alg_ext lr 2e-3 0.4539 0.4138 0.4999 0.3071 0.6233 0.5279 0.2364 0.3231 0.6993