Skip to content

How to use fine-tuned model? #157

Open
@aldialimucaj

Description

@aldialimucaj

I did a small fine-tuning and the process finishes correctly. The output model is too small though and there are no weights. The list of the files is this

config.json
generation_config.json
model.safetensors (around 250 MiB)
runs/
special_tokens_map.json
tokenizer.json
tokenizer_config.json
trainer_state.json
training_args.bin

Im using the same command that you suggest:
deepspeed finetune_deepseekcoder.py
--model_name_or_path $MODEL_PATH
--data_path $DATA_PATH
--output_dir $OUTPUT_PATH
--num_train_epochs 3
--model_max_length 1024
--per_device_train_batch_size 16
--per_device_eval_batch_size 1
--gradient_accumulation_steps 4
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 100
--save_total_limit 100
--learning_rate 2e-5
--warmup_steps 10
--logging_steps 1
--lr_scheduler_type "cosine"
--gradient_checkpointing True
--report_to "tensorboard"
--deepspeed configs/ds_config_zero3.json
--bf16 True

Could you also give an example on how to use the output model?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions