Skip to content

Commit 4799ba4

Browse files
authored
Capybara replaced with ultrafeedback_binarized (huggingface#2183)
1 parent d45c86e commit 4799ba4

File tree

3 files changed

+8
-8
lines changed

3 files changed

+8
-8
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ from trl import DPOConfig, DPOTrainer
187187

188188
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
189189
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
190-
dataset = load_dataset("trl-lib/Capybara-Preferences", split="train")
190+
dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
191191
training_args = DPOConfig(output_dir="Qwen2.5-0.5B-DPO")
192192
trainer = DPOTrainer(model=model, args=training_args, train_dataset=dataset, tokenizer=tokenizer)
193193
trainer.train()

docs/source/cpo_trainer.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,10 @@ CPO aims to mitigate two fundamental shortcomings of SFT. First, SFT’s methodo
1010

1111
## Quick start
1212

13-
This example demonstrates how to train a model using the CPO method. We use the [Qwen 0.5B model](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) as the base model. We use the preference data from the [Capybara dataset](https://huggingface.co/datasets/openbmb/UltraFeedback). You can view the data in the dataset here:
13+
This example demonstrates how to train a model using the CPO method. We use the [Qwen 0.5B model](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) as the base model. We use the preference data from the [UltraFeedback dataset](https://huggingface.co/datasets/openbmb/UltraFeedback). You can view the data in the dataset here:
1414

1515
<iframe
16-
src="https://huggingface.co/datasets/trl-lib/Capybara-Preferences/embed/viewer/default/train?row=0"
16+
src="https://huggingface.co/datasets/trl-lib/ultrafeedback_binarized/embed/viewer/default/train?row=0"
1717
frameborder="0"
1818
width="100%"
1919
height="560px"
@@ -29,7 +29,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
2929

3030
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
3131
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
32-
train_dataset = load_dataset("trl-lib/Capybara-Preferences", split="train")
32+
train_dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
3333

3434
training_args = CPOConfig(output_dir="Qwen2-0.5B-CPO", logging_steps=10)
3535
trainer = CPOTrainer(model=model, args=training_args, tokenizer=tokenizer, train_dataset=train_dataset)

docs/source/dpo_trainer.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,10 @@ Read more about DPO algorithm in the [original paper](https://huggingface.co/pap
2525

2626
## Quick start
2727

28-
This example demonstrates how to train a model using the DPO method. We use the [Qwen 0.5B model](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) as the base model. We use the preference data from the [Capybara dataset](https://huggingface.co/datasets/openbmb/UltraFeedback). You can view the data in the dataset here:
28+
This example demonstrates how to train a model using the DPO method. We use the [Qwen 0.5B model](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) as the base model. We use the preference data from the [UltraFeedback dataset](https://huggingface.co/datasets/openbmb/UltraFeedback). You can view the data in the dataset here:
2929

3030
<iframe
31-
src="https://huggingface.co/datasets/trl-lib/Capybara-Preferences/embed/viewer/default/train?row=0"
31+
src="https://huggingface.co/datasets/trl-lib/ultrafeedback_binarized/embed/viewer/default/train?row=0"
3232
frameborder="0"
3333
width="100%"
3434
height="560px"
@@ -44,7 +44,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
4444

4545
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
4646
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
47-
train_dataset = load_dataset("trl-lib/Capybara-Preferences", split="train")
47+
train_dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
4848

4949
training_args = DPOConfig(output_dir="Qwen2-0.5B-DPO", logging_steps=10)
5050
trainer = DPOTrainer(model=model, args=training_args, tokenizer=tokenizer, train_dataset=train_dataset)
@@ -190,7 +190,7 @@ First install `unsloth` according to the [official documentation](https://github
190190
- tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
191191
+ model, tokenizer = FastLanguageModel.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
192192
+ model = FastLanguageModel.get_peft_model(model)
193-
train_dataset = load_dataset("trl-lib/Capybara-Preferences", split="train")
193+
train_dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
194194

195195
- training_args = DPOConfig(output_dir="Qwen2-0.5B-DPO", logging_steps=10)
196196
+ training_args = DPOConfig(output_dir="Qwen2-0.5B-DPO", logging_steps=10, bf16=True)

0 commit comments

Comments
 (0)