How to use fine-tuned model?

I did a small fine-tuning and the process finishes correctly. The output model is too small though and there are no weights. The list of the files is this

config.json
generation_config.json
model.safetensors (around 250 MiB)
runs/
special_tokens_map.json
tokenizer.json
tokenizer_config.json
trainer_state.json
training_args.bin

Im using the same command that you suggest:
deepspeed finetune_deepseekcoder.py \
    --model_name_or_path $MODEL_PATH \
    --data_path $DATA_PATH \
    --output_dir $OUTPUT_PATH \
    --num_train_epochs 3 \
    --model_max_length 1024 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 100 \
    --save_total_limit 100 \
    --learning_rate 2e-5 \
    --warmup_steps 10 \
    --logging_steps 1 \
    --lr_scheduler_type "cosine" \
    --gradient_checkpointing True \
    --report_to "tensorboard" \
    --deepspeed configs/ds_config_zero3.json \
    --bf16 True

Could you also give an example on how to use the output model?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use fine-tuned model? #157

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to use fine-tuned model? #157

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions