Skip to content

Commit 8318d1f

Browse files
d-kleinerasbt
andauthored
minor DPO fixes (#298)
* fixed issues, updated .gitignore * added closing paren * fixed CEL spelling * fixed more minor issues * Update ch07/01_main-chapter-code/ch07.ipynb * Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb * Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb * Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb --------- Co-authored-by: Sebastian Raschka <[email protected]>
1 parent 36b9d5e commit 8318d1f

File tree

3 files changed

+16
-14
lines changed

3 files changed

+16
-14
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,8 @@ ch07/01_main-chapter-code/instruction-data-with-response-alpaca52k.json
8585
ch07/01_main-chapter-code/instruction-data-with-response-lora.json
8686
ch07/01_main-chapter-code/instruction-data-with-response-phi3-prompt.json
8787
ch07/02_dataset-utilities/instruction-examples-modified.json
88+
ch07/04_preference-tuning-with-dpo/gpt2-medium355M-sft.pth
89+
ch07/04_preference-tuning-with-dpo/loss-plot.pdf
8890

8991
# Temporary OS-related files
9092
.DS_Store

ch07/01_main-chapter-code/ch07.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2722,7 +2722,7 @@
27222722
"- I hope you enjoyed this journey of implementing an LLM from the ground up and coding the pretraining and finetuning functions\n",
27232723
"- In my opinion, implementing an LLM from scratch is the best way to understand how LLMs work; I hope you gained a better understanding through this approach\n",
27242724
"- While this book serves educational purposes, you may be interested in using different and more powerful LLMs for real-world applications\n",
2725-
" - For this, you may consider popular tools such as axolotl ([https://github.com/OpenAccess-AI-Collective/axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)) or LitGPT ([https://github.com/Lightning-AI/litgpt](https://github.com/Lightning-AI/litgpt), which I help developing"
2725+
" - For this, you may consider popular tools such as axolotl ([https://github.com/OpenAccess-AI-Collective/axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)) or LitGPT ([https://github.com/Lightning-AI/litgpt](https://github.com/Lightning-AI/litgpt)), which I help developing"
27262726
]
27272727
},
27282728
{
@@ -2762,7 +2762,7 @@
27622762
"name": "python",
27632763
"nbconvert_exporter": "python",
27642764
"pygments_lexer": "ipython3",
2765-
"version": "3.10.6"
2765+
"version": "3.10.11"
27662766
}
27672767
},
27682768
"nbformat": 4,

ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1554,11 +1554,11 @@
15541554
"if not finetuned_model_path.exists():\n",
15551555
"\n",
15561556
" # Try finding the model checkpoint locally:\n",
1557-
" relative_path = Path(\"..\") / \"ch07\" / finetuned_model_path\n",
1557+
" relative_path = Path(\"..\") / \"01_main-chapter-code\" / finetuned_model_path\n",
15581558
" if relative_path.exists():\n",
15591559
" shutil.copy(relative_path, \".\")\n",
15601560
"\n",
1561-
" # If this notebook is run on Google Colab, get it from a Googe Drive folder\n",
1561+
" # If this notebook is run on Google Colab, get it from a Google Drive folder\n",
15621562
" elif \"COLAB_GPU\" in os.environ or \"COLAB_TPU_ADDR\" in os.environ:\n",
15631563
" from google.colab import drive\n",
15641564
" drive.mount(\"/content/drive\")\n",
@@ -1875,10 +1875,10 @@
18751875
"- Keeping this in mind, let's go through some of the steps (we will calculate the `logprobs` using a separate function later)\n",
18761876
"- Let's start with the lines\n",
18771877
"\n",
1878-
"```python\n",
1879-
"model_logratios = model_chosen_logprobs - model_rejected_logprobs\n",
1880-
"reference_logratios = reference_chosen_logprobs - reference_rejected_logprobs\n",
1881-
"```\n",
1878+
" ```python\n",
1879+
" model_logratios = model_chosen_logprobs - model_rejected_logprobs\n",
1880+
" reference_logratios = reference_chosen_logprobs - reference_rejected_logprobs\n",
1881+
" ```\n",
18821882
"\n",
18831883
"- These lines above calculate the difference in log probabilities (logits) for the chosen and rejected samples for both the policy model and the reference model (this is due to $\\log\\left(\\frac{a}{b}\\right) = \\log a - \\log b$):\n",
18841884
"\n",
@@ -1936,7 +1936,7 @@
19361936
"\n",
19371937
" Args:\n",
19381938
" logits: Tensor of shape (batch_size, num_tokens, vocab_size)\n",
1939-
" labels: Tensor of shape (batch_size, snum_tokens)\n",
1939+
" labels: Tensor of shape (batch_size, num_tokens)\n",
19401940
" selection_mask: Tensor for shape (batch_size, num_tokens)\n",
19411941
"\n",
19421942
" Returns:\n",
@@ -1981,7 +1981,7 @@
19811981
"id": "cf6a71ac-3fcc-44a4-befc-1c56bbd378d7"
19821982
},
19831983
"source": [
1984-
"- Note that this function above might look a bit intimidating at first due to the `torch.gather` function, but it's pretty similar to what happens under the hood in PyTorch's `cross_entropy` function\n",
1984+
"- Note that this function above might look a bit intimidating at first due to the `torch.gather` function, but it's pretty similar to what happens under the hood in PyTorch's `cross_entropy` function\n",
19851985
"- For example, consider the following example:"
19861986
]
19871987
},
@@ -2264,7 +2264,7 @@
22642264
"id": "852e4c09-d285-44d5-be12-d29769950cb6"
22652265
},
22662266
"source": [
2267-
"- Why a specified `num_batches`? That's purely for efficiency reasons (because calculating the loss on the whole dataset each time would slow down the training significantly"
2267+
"- Why a specified `num_batches`? That's purely for efficiency reasons (because calculating the loss on the whole dataset each time would slow down the training significantly)"
22682268
]
22692269
},
22702270
{
@@ -2354,7 +2354,7 @@
23542354
"source": [
23552355
"- After setting up the DPO loss functions in the previous section, we can now finally train the model\n",
23562356
"- Note that this training function is the same one we used for pretraining and instruction finetuning, with minor differences:\n",
2357-
" - we swap the cross entropy loss with our new DPO loss function\n",
2357+
" - we swap the cross-entropy loss with our new DPO loss function\n",
23582358
" - we also track the rewards and reward margins, which are commonly used in RLHF and DPO contexts to track the training progress\n"
23592359
]
23602360
},
@@ -2394,7 +2394,7 @@
23942394
"\n",
23952395
" for batch_idx, batch in enumerate(train_loader):\n",
23962396
"\n",
2397-
" optimizer.zero_grad() # Reset loss gradients from previous epoch\n",
2397+
" optimizer.zero_grad() # Reset loss gradients from previous batch iteration\n",
23982398
"\n",
23992399
" loss, chosen_rewards, rejected_rewards = compute_dpo_loss_batch(\n",
24002400
" batch=batch,\n",
@@ -3088,7 +3088,7 @@
30883088
"name": "python",
30893089
"nbconvert_exporter": "python",
30903090
"pygments_lexer": "ipython3",
3091-
"version": "3.11.4"
3091+
"version": "3.11.9"
30923092
}
30933093
},
30943094
"nbformat": 4,

0 commit comments

Comments
 (0)