-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
questionFurther information is requestedFurther information is requestedtrainingFine tuning related featuresFine tuning related features
Description
User Story
Instead of always picking from a base model from hf or unsloth, use an already-tuned model. This sounds weird but in some use case you might want to first do instruction tuning with SFT for domain adaptation, then do DPO for preference alignment, both requires different dataset and different setup. This is not possible now since you cannot load a saved model to fine tune.
Alternative
We can also implement trainers like ORPO that does instruction + preference in one step + doesn't need reference model
https://huggingface.co/docs/trl/main/en/orpo_trainer
Acceptance Criteria
Find a use case for this and test it and submit a performance improvement validation
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requestedtrainingFine tuning related featuresFine tuning related features