Skip to content

[BUG] Quantized models silently fail on GSM8K-CoT #1618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Eijnewgnaw opened this issue May 20, 2025 · 0 comments
Closed

[BUG] Quantized models silently fail on GSM8K-CoT #1618

Eijnewgnaw opened this issue May 20, 2025 · 0 comments

Comments

@Eijnewgnaw
Copy link

@Qubitium
and if i use the chat-template the goal will be better but still far away from the report :
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
INFO:root:Running evaluation on LM_EVAL.GSM8K_COT...
INFO Eval: loading using backend = auto
from_quantized: adapter: None
INFO Loader: Auto dtype (native bfloat16): torch.bfloat16
INFO Estimated Quantization BPW (bits per weight): 4.85 bpw, based on [bits: 4, group_size: 32]
INFO Kernel: Auto-selection: adding candidate TorchQuantLinear
INFO Kernel: candidates -> [TorchQuantLinear]
INFO Kernel: selected -> TorchQuantLinear.
WARNING:accelerate.utils.modeling:The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function.
INFO Format: Converting checkpoint_format from FORMAT.GPTQ to internal FORMAT.GPTQ_V2.
INFO Format: Converting GPTQ v1 to v2
INFO Format: Conversion complete: 0.01409006118774414s
INFO Kernel: Auto-selection: adding candidate TorchQuantLinear
INFO Optimize: TorchQuantLinear compilation triggered.
INFO:tokenicer.tokenicer:Tokenicer: Auto fixed pad_token_id=128004 (token='<|finetune_right_pad_id|>').
INFO Model: Loaded generation_config: GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": [
128001,
128008,
128009
]
}

INFO Model: Auto-fixed generation_config mismatch between model and generation_config.json.
INFO Model: Updated generation_config: GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128008,
128009
],
"temperature": 0.6,
"top_p": 0.9
}

INFO Kernel: loaded -> [TorchQuantLinear]
INFO 05-19 10:36:00 [init.py:248] Automatically detected platform cuda.
WARNING:lm_eval.models.huggingface:pretrained model kwarg is not of type str. Many other model arguments may be ignored. Please do not launch via accelerate or use parallelize=True if passing an existing model this way.
WARNING:lm_eval.models.huggingface:Passed an already-initialized model through pretrained, assuming single-process call to evaluate() or custom distributed integration
INFO LM-EVAL: gen_kwargs = do_sample=True,temperature=0.6,top_k=50,top_p=0.9
INFO LM-EVAL: apply_chat_template = True
INFO:lm_eval.evaluator:Setting random seed to 1234 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
WARNING:lm_eval.evaluator:generation_kwargs specified through cli, these settings will update set parameters in yaml tasks. Ensure 'do_sample=True' for non-greedy decoding!
INFO:lm_eval.evaluator:Using pre-initialized model
WARNING:lm_eval.evaluator:Chat template formatting change affects loglikelihood and multiple-choice tasks. See docs/chat-template-readme.md for details.
INFO:lm_eval.api.task:Building contexts for gsm8k_cot on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:08<00:00, 158.38it/s]
INFO:lm_eval.evaluator:Running generate_until requests
Running generate_until requests: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [47:07<00:00, 2.14s/it]
--------lm_eval Eval Result---------

Tasks Version Filter n-shot Metric Value Stderr
gsm8k_cot 3 flexible-extract 8 exact_match 0.1895 ± 0.0108
strict-match 8 exact_match 0.0751 ± 0.0073

--------lm_eval Result End---------

Originally posted by @Eijnewgnaw in #1560

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant