用trainer.train()的时候报错：KeyError: 'eval_loss' #1823

chenxinxi · 2024-11-16T01:34:41Z

Describe the bug/ 问题描述 (Mandatory / 必填)
用trainer.train()的时候报错：KeyError: 'eval_loss'，但pytorch代码没有报错。

Hardware Environment(Ascend/GPU/CPU) / 硬件环境:
Ascend
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
MindSpore:2.3.1
mindnlp:0.4.1
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):
graph

To Reproduce / 重现步骤 (Mandatory / 必填)

from mindnlp.engine import Trainer, TrainingArguments
training_args = TrainingArguments(
  output_dir="./vit-base-food101",
  per_device_train_batch_size=16,
  evaluation_strategy="steps",
  num_train_epochs=4,
  fp16=True,
  save_steps=100,
  eval_steps=100,
  logging_steps=10,
  learning_rate=2e-4,
  save_total_limit=2,
  remove_unused_columns=True,
  load_best_model_at_end=True,
)
import numpy as np
import evaluate
metric = evaluate.load("accuracy")
# the compute_metrics function takes a Named Tuple as input:
# predictions, which are the logits of the model as Numpy arrays,
# and label_ids, which are the ground-truth labels as Numpy arrays.
def compute_metrics(eval_pred):
    """Computes accuracy on a batch of predictions"""
    predictions = np.argmax(eval_pred.predictions, axis=1)
    return metric.compute(predictions=predictions, references=eval_pred.label_ids)

trainer = Trainer(
    model=lora_model,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_ds,
    eval_dataset=val_ds,
    tokenizer=image_processor,
)

然后运行train_results = trainer.train()时报错。

Expected behavior / 预期结果 (Mandatory / 必填)
训练结束，但只训练到epoch0.36

Screenshots/ 日志 / 截图 (Mandatory / 必填)

Additional context / 备注 (Optional / 选填)
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

lvyufeng · 2024-11-16T07:59:27Z

完整代码附件传一下

chenxinxi · 2024-11-16T08:18:47Z

111601.zip老师我改了一下load_best_model_at_end=False就能跑通了，现在没有eval_accuracy，在想办法解决。

TracyGuo2001 · 2024-11-22T08:22:15Z

111601.zip老师我改了一下load_best_model_at_end=False就能跑通了，现在没有eval_accuracy，在想办法解决。

您好，我遇到了相同的问题，eval的metrics不包含loss，请问您是如何解决的

chenxinxi added the bug Something isn't working label Nov 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

用trainer.train()的时候报错：KeyError: 'eval_loss' #1823

用trainer.train()的时候报错：KeyError: 'eval_loss' #1823

chenxinxi commented Nov 16, 2024

lvyufeng commented Nov 16, 2024

chenxinxi commented Nov 16, 2024

TracyGuo2001 commented Nov 22, 2024

用trainer.train()的时候报错：KeyError: 'eval_loss' #1823

用trainer.train()的时候报错：KeyError: 'eval_loss' #1823

Comments

chenxinxi commented Nov 16, 2024

lvyufeng commented Nov 16, 2024

chenxinxi commented Nov 16, 2024

TracyGuo2001 commented Nov 22, 2024