Skip to content

Commit ef45b97

Browse files
authored
add multimodal support (#3231)
1 parent b5e4f48 commit ef45b97

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

hf-skills-training.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,9 @@ The coding agent analyzes your request and prepares a training configuration. Fo
128128
>[!NOTE]
129129
> The `open-r1/codeforces-cots` dataset is a dataset of codeforces problems and solutions. It is a good dataset for instruction tuning a model to solve hard coding problems.
130130
131+
>[!NOTE]
132+
> This works for vision language models too! You can simply run "Fine-tune Qwen/Qwen3-VL-2B-Instruct on llava-instruct-mix"
133+
131134
### Review Before Submitting
132135

133136
Before your coding agent submits anything, you'll see the configuration:
@@ -226,6 +229,9 @@ The dataset has 'chosen' and 'rejected' columns.
226229
> [!WARNING]
227230
> DPO is sensitive to dataset format. It requires columns named exactly `chosen` and `rejected`, or a `prompt` column with the input. The agent validates this first and shows you how to map columns if your dataset uses different names.
228231
232+
> [!NOTE]
233+
> You can run DPO using Skills on vision language models too! Try it out with [openbmb/RLAIF-V-Dataset](http://hf.co/datasets/openbmb/RLAIF-V-Dataset). Claude will apply minor modifications but will succeed in training.
234+
229235
### Group Relative Policy Optimization (GRPO)
230236

231237
GRPO is a reinforcement learning task that is proven to be effective on verifiable tasks like solving math problems, writing code, or any task with a programmatic success criterion.

0 commit comments

Comments
 (0)