微调llama3-8b的时候，eval_loss不断上升，考虑到了使用多个数据集混合，但还是没有效果，应该怎么解决？ #4566

MemoryOldTime · 2024-06-26T08:29:10Z

MemoryOldTime
Jun 26, 2024

Reminder

I have read the README and searched the existing issues.

System Info

8张Asend910A，数据集采用的alpaca_en（21.7MB）和alpaca_gpt4_en（41.3MB），利用lora技术进行混合微调

Reproduction

#!/bin/bash

NPROC_PER_NODE=8
NNODES=1
RANK=0

ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun
--nproc_per_node $NPROC_PER_NODE
--nnodes $NNODES
--node_rank $RANK
src/train.py examples/train_lora/llama3_lora_sft_ds0.yaml

Expected behavior

感觉不太像是数据集不够的原因，模型参数明显也不能改变了，不太像是正常情况，还有什么办法可以解决这种问题吗

Others

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

微调llama3-8b的时候，eval_loss不断上升，考虑到了使用多个数据集混合，但还是没有效果，应该怎么解决？ #4566

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

微调llama3-8b的时候，eval_loss不断上升，考虑到了使用多个数据集混合，但还是没有效果，应该怎么解决？ #4566

Uh oh!

MemoryOldTime Jun 26, 2024

Reminder

System Info

Reproduction

Expected behavior

Others

Replies: 0 comments

MemoryOldTime
Jun 26, 2024