You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I did a small fine-tuning and the process finishes correctly. The output model is too small though and there are no weights. The list of the files is this
I did a small fine-tuning and the process finishes correctly. The output model is too small though and there are no weights. The list of the files is this
config.json
generation_config.json
model.safetensors (around 250 MiB)
runs/
special_tokens_map.json
tokenizer.json
tokenizer_config.json
trainer_state.json
training_args.bin
Im using the same command that you suggest:
deepspeed finetune_deepseekcoder.py
--model_name_or_path $MODEL_PATH
--data_path $DATA_PATH
--output_dir $OUTPUT_PATH
--num_train_epochs 3
--model_max_length 1024
--per_device_train_batch_size 16
--per_device_eval_batch_size 1
--gradient_accumulation_steps 4
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 100
--save_total_limit 100
--learning_rate 2e-5
--warmup_steps 10
--logging_steps 1
--lr_scheduler_type "cosine"
--gradient_checkpointing True
--report_to "tensorboard"
--deepspeed configs/ds_config_zero3.json
--bf16 True
Could you also give an example on how to use the output model?
The text was updated successfully, but these errors were encountered: