[BUG] Quantized models silently fail on GSM8K-CoT #1618

Eijnewgnaw · 2025-05-20T16:34:24Z

@Qubitium
and if i use the chat-template the goal will be better but still far away from the report :
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
INFO:root:Running evaluation on LM_EVAL.GSM8K_COT...
INFO Eval: loading using backend = auto
from_quantized: adapter: None
INFO Loader: Auto dtype (native bfloat16): torch.bfloat16
INFO Estimated Quantization BPW (bits per weight): 4.85 bpw, based on [bits: 4, group_size: 32]
INFO Kernel: Auto-selection: adding candidate TorchQuantLinear
INFO Kernel: candidates -> [TorchQuantLinear]
INFO Kernel: selected -> TorchQuantLinear.
WARNING:accelerate.utils.modeling:The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function.
INFO Format: Converting checkpoint_format from FORMAT.GPTQ to internal FORMAT.GPTQ_V2.
INFO Format: Converting GPTQ v1 to v2
INFO Format: Conversion complete: 0.01409006118774414s
INFO Kernel: Auto-selection: adding candidate TorchQuantLinear
INFO Optimize: TorchQuantLinear compilation triggered.
INFO:tokenicer.tokenicer:Tokenicer: Auto fixed pad_token_id=128004 (token='<|finetune_right_pad_id|>').
INFO Model: Loaded generation_config: GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": [
128001,
128008,
128009
]
}

INFO Model: Auto-fixed generation_config mismatch between model and generation_config.json.
INFO Model: Updated generation_config: GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128008,
128009
],
"temperature": 0.6,
"top_p": 0.9
}

INFO Kernel: loaded -> [TorchQuantLinear]
INFO 05-19 10:36:00 [init.py:248] Automatically detected platform cuda.
WARNING:lm_eval.models.huggingface:pretrained model kwarg is not of type str. Many other model arguments may be ignored. Please do not launch via accelerate or use parallelize=True if passing an existing model this way.
WARNING:lm_eval.models.huggingface:Passed an already-initialized model through pretrained, assuming single-process call to evaluate() or custom distributed integration
INFO LM-EVAL: gen_kwargs = do_sample=True,temperature=0.6,top_k=50,top_p=0.9
INFO LM-EVAL: apply_chat_template = True
INFO:lm_eval.evaluator:Setting random seed to 1234 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
WARNING:lm_eval.evaluator:generation_kwargs specified through cli, these settings will update set parameters in yaml tasks. Ensure 'do_sample=True' for non-greedy decoding!
INFO:lm_eval.evaluator:Using pre-initialized model
WARNING:lm_eval.evaluator:Chat template formatting change affects loglikelihood and multiple-choice tasks. See docs/chat-template-readme.md for details.
INFO:lm_eval.api.task:Building contexts for gsm8k_cot on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:08<00:00, 158.38it/s]
INFO:lm_eval.evaluator:Running generate_until requests
Running generate_until requests: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [47:07<00:00, 2.14s/it]
--------lm_eval Eval Result---------

Tasks Version Filter n-shot Metric Value Stderr

gsm8k_cot 3 flexible-extract 8 exact_match ↑ 0.1895 ± 0.0108

strict-match 8 exact_match ↑ 0.0751 ± 0.0073

--------lm_eval Result End---------

Originally posted by @Eijnewgnaw in #1560

The text was updated successfully, but these errors were encountered:

Eijnewgnaw closed this as completed May 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Quantized models silently fail on GSM8K-CoT #1618

[BUG] Quantized models silently fail on GSM8K-CoT #1618

Eijnewgnaw commented May 20, 2025

[BUG] Quantized models silently fail on GSM8K-CoT #1618

[BUG] Quantized models silently fail on GSM8K-CoT #1618

Comments

Eijnewgnaw commented May 20, 2025