微调llama3-8b的时候,eval_loss不断上升,考虑到了使用多个数据集混合,但还是没有效果,应该怎么解决? #4566
Unanswered
MemoryOldTime
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Reminder
System Info
8张Asend910A,数据集采用的alpaca_en(21.7MB)和alpaca_gpt4_en(41.3MB),利用lora技术进行混合微调
Reproduction
#!/bin/bash
NPROC_PER_NODE=8
NNODES=1
RANK=0
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun
--nproc_per_node $NPROC_PER_NODE
--nnodes $NNODES
--node_rank $RANK
src/train.py examples/train_lora/llama3_lora_sft_ds0.yaml
Expected behavior
感觉不太像是数据集不够的原因,模型参数明显也不能改变了,不太像是正常情况,还有什么办法可以解决这种问题吗

Others
No response
Beta Was this translation helpful? Give feedback.
All reactions