You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@Qubitium
and if i use the chat-template the goal will be better but still far away from the report :
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
INFO:root:Running evaluation on LM_EVAL.GSM8K_COT...
INFO Eval: loading using backend = auto
from_quantized: adapter: None
INFO Loader: Auto dtype (native bfloat16): torch.bfloat16
INFO Estimated Quantization BPW (bits per weight): 4.85 bpw, based on [bits: 4, group_size: 32]
INFO Kernel: Auto-selection: adding candidate TorchQuantLinear
INFO Kernel: candidates -> [TorchQuantLinear]
INFO Kernel: selected -> TorchQuantLinear.
WARNING:accelerate.utils.modeling:The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function.
INFO Format: Converting checkpoint_format from FORMAT.GPTQ to internal FORMAT.GPTQ_V2.
INFO Format: Converting GPTQ v1 to v2
INFO Format: Conversion complete: 0.01409006118774414s
INFO Kernel: Auto-selection: adding candidate TorchQuantLinear
INFO Optimize: TorchQuantLinear compilation triggered.
INFO:tokenicer.tokenicer:Tokenicer: Auto fixed pad_token_id=128004 (token='<|finetune_right_pad_id|>').
INFO Model: Loaded generation_config: GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": [
128001,
128008,
128009
]
}
INFO Model: Auto-fixed generation_config mismatch between model and generation_config.json.
INFO Model: Updated generation_config: GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128008,
128009
],
"temperature": 0.6,
"top_p": 0.9
}
INFO Kernel: loaded -> [TorchQuantLinear]
INFO 05-19 10:36:00 [init.py:248] Automatically detected platform cuda.
WARNING:lm_eval.models.huggingface:pretrained model kwarg is not of type str. Many other model arguments may be ignored. Please do not launch via accelerate or use parallelize=True if passing an existing model this way.
WARNING:lm_eval.models.huggingface:Passed an already-initialized model through pretrained, assuming single-process call to evaluate() or custom distributed integration
INFO LM-EVAL: gen_kwargs = do_sample=True,temperature=0.6,top_k=50,top_p=0.9
INFO LM-EVAL: apply_chat_template = True
INFO:lm_eval.evaluator:Setting random seed to 1234 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
WARNING:lm_eval.evaluator:generation_kwargs specified through cli, these settings will update set parameters in yaml tasks. Ensure 'do_sample=True' for non-greedy decoding!
INFO:lm_eval.evaluator:Using pre-initialized model
WARNING:lm_eval.evaluator:Chat template formatting change affects loglikelihood and multiple-choice tasks. See docs/chat-template-readme.md for details.
INFO:lm_eval.api.task:Building contexts for gsm8k_cot on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:08<00:00, 158.38it/s]
INFO:lm_eval.evaluator:Running generate_until requests
Running generate_until requests: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [47:07<00:00, 2.14s/it]
--------lm_eval Eval Result---------
Originally posted by @Eijnewgnaw in #1560
The text was updated successfully, but these errors were encountered: