微调多模态数据集时,Stream模式,epoch和steps怎么设置 #8273
Unanswered
Juvenilecris
asked this question in
Q&A
Replies: 2 comments
-
同样的问题,一般来说流式数据读取完了会抛出stopIteration异常然后进入下一个epoch。 但是这里难道是会自动重启然后永远在第一个epoch中? 有没有人做过测试的 |
Beta Was this translation helpful? Give feedback.
0 replies
-
参考huggingface 官方文档 关于max_steps 的说明: https://huggingface.co/docs/transformers/en/main_classes/trainer?utm_source=chatgpt.com#transformers.Seq2SeqTrainer 当使用max_steps的时候, 将覆盖关于epoch的设置,如果数据集消耗完了会在同一个epoch中从头开始继续训练。 也就是说永远只在epoch0中训练直到达到max_steps. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Reminder
System Info
model_name_or_path: /fs-computility/llm_code_collab/liujiaheng/wangnn/models/Qwen/Qwen2.5-VL-7B-Instruct
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true
stage: sft
do_train: true
finetuning_type: full
freeze_vision_tower: true
freeze_multi_modal_projector: true
freeze_language_model: false
deepspeed: examples/deepspeed/ds_z3_config.json
dataset: shartgptvideo_train_300k
buffer_size: 128
preprocessing_batch_size: 128
streaming: true
accelerator_config:
dispatch_batches: false
template: qwen2_vl
cutoff_len: 32768
overwrite_cache: false
preprocessing_num_workers: 32
dataloader_num_workers: 16
max_steps: 1000000
output_dir: saves/qwen2_5vl-7b/sft/621441
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 1.0e-5
num_train_epochs: 2.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
Reproduction
Others
No response
Beta Was this translation helpful? Give feedback.
All reactions