Replies: 1 comment
-
|
目前是调整了num_train_epochs为2,learning_rate: 1.0e-6缩小一半,模型没有再输出多余内容了 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
我对已经SFT之后的qwen2.5-7b模型再次进行simpo全量微调,结果输出的结果会加上一些尾巴,或者把整个prompt输出出来了。
训练参数如下:
model
model_name_or_path:
simpo
stage: dpo
pref_loss: simpo
pref_beta: 2.0
simpo_gamma: 0.2
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json
dataset
dataset: simpo_data
template: qwen
cutoff_len: 2048
max_samples: 100000
overwrite_cache: true
preprocessing_num_workers: 8
output
output_dir:
logging_steps: 10
save_steps: 300
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 1.0e-6
num_train_epochs: 5
lr_scheduler_type: cosine
warmup_ratio: 0.03
bf16: true
ddp_timeout: 180000000
eval
val_size: 0.05
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 10
结果训练出来的输出结果中出现一下答案:
1.名创优品迪士尼联名毛巾掉毛吗?opportunitàת חיפוש מקוונת
2.科颜氏高保湿面霜是否适合冬天用?opportunità de utilizare: 1. 考虑到用户的问题是从保湿时长转到适用季节的,直接提问面霜的季节适用性。 2. 确保问题的逻辑性和完整性。 3. 按就近原则处理。
3.元气森林卡曼橘口味的气泡细腻吗?opportunitàה:
user
<角色>:
······
······
第3个输出中后续就都是prompt的文本原样输出,避免过于冗余,我这里使用······代替了。
想询问一下大佬们,为什么会出现这个问题呢?从哪些方面去排除这个问题。
Beta Was this translation helpful? Give feedback.
All reactions