|
1554 | 1554 | "if not finetuned_model_path.exists():\n", |
1555 | 1555 | "\n", |
1556 | 1556 | " # Try finding the model checkpoint locally:\n", |
1557 | | - " relative_path = Path(\"..\") / \"ch07\" / finetuned_model_path\n", |
| 1557 | + " relative_path = Path(\"..\") / \"01_main-chapter-code\" / finetuned_model_path\n", |
1558 | 1558 | " if relative_path.exists():\n", |
1559 | 1559 | " shutil.copy(relative_path, \".\")\n", |
1560 | 1560 | "\n", |
1561 | | - " # If this notebook is run on Google Colab, get it from a Googe Drive folder\n", |
| 1561 | + " # If this notebook is run on Google Colab, get it from a Google Drive folder\n", |
1562 | 1562 | " elif \"COLAB_GPU\" in os.environ or \"COLAB_TPU_ADDR\" in os.environ:\n", |
1563 | 1563 | " from google.colab import drive\n", |
1564 | 1564 | " drive.mount(\"/content/drive\")\n", |
|
1875 | 1875 | "- Keeping this in mind, let's go through some of the steps (we will calculate the `logprobs` using a separate function later)\n", |
1876 | 1876 | "- Let's start with the lines\n", |
1877 | 1877 | "\n", |
1878 | | - "```python\n", |
1879 | | - "model_logratios = model_chosen_logprobs - model_rejected_logprobs\n", |
1880 | | - "reference_logratios = reference_chosen_logprobs - reference_rejected_logprobs\n", |
1881 | | - "```\n", |
| 1878 | + " ```python\n", |
| 1879 | + " model_logratios = model_chosen_logprobs - model_rejected_logprobs\n", |
| 1880 | + " reference_logratios = reference_chosen_logprobs - reference_rejected_logprobs\n", |
| 1881 | + " ```\n", |
1882 | 1882 | "\n", |
1883 | 1883 | "- These lines above calculate the difference in log probabilities (logits) for the chosen and rejected samples for both the policy model and the reference model (this is due to $\\log\\left(\\frac{a}{b}\\right) = \\log a - \\log b$):\n", |
1884 | 1884 | "\n", |
|
1936 | 1936 | "\n", |
1937 | 1937 | " Args:\n", |
1938 | 1938 | " logits: Tensor of shape (batch_size, num_tokens, vocab_size)\n", |
1939 | | - " labels: Tensor of shape (batch_size, snum_tokens)\n", |
| 1939 | + " labels: Tensor of shape (batch_size, num_tokens)\n", |
1940 | 1940 | " selection_mask: Tensor for shape (batch_size, num_tokens)\n", |
1941 | 1941 | "\n", |
1942 | 1942 | " Returns:\n", |
|
1981 | 1981 | "id": "cf6a71ac-3fcc-44a4-befc-1c56bbd378d7" |
1982 | 1982 | }, |
1983 | 1983 | "source": [ |
1984 | | - "- Note that this function above might look a bit intimidating at first due to the `torch.gather` function, but it's pretty similar to what happens under the hood in PyTorch's `cross_entropy` function\n", |
| 1984 | + "- Note that this function above might look a bit intimidating at first due to the `torch.gather` function, but it's pretty similar to what happens under the hood in PyTorch's `cross_entropy` function\n", |
1985 | 1985 | "- For example, consider the following example:" |
1986 | 1986 | ] |
1987 | 1987 | }, |
|
2264 | 2264 | "id": "852e4c09-d285-44d5-be12-d29769950cb6" |
2265 | 2265 | }, |
2266 | 2266 | "source": [ |
2267 | | - "- Why a specified `num_batches`? That's purely for efficiency reasons (because calculating the loss on the whole dataset each time would slow down the training significantly" |
| 2267 | + "- Why a specified `num_batches`? That's purely for efficiency reasons (because calculating the loss on the whole dataset each time would slow down the training significantly)" |
2268 | 2268 | ] |
2269 | 2269 | }, |
2270 | 2270 | { |
|
2354 | 2354 | "source": [ |
2355 | 2355 | "- After setting up the DPO loss functions in the previous section, we can now finally train the model\n", |
2356 | 2356 | "- Note that this training function is the same one we used for pretraining and instruction finetuning, with minor differences:\n", |
2357 | | - " - we swap the cross entropy loss with our new DPO loss function\n", |
| 2357 | + " - we swap the cross-entropy loss with our new DPO loss function\n", |
2358 | 2358 | " - we also track the rewards and reward margins, which are commonly used in RLHF and DPO contexts to track the training progress\n" |
2359 | 2359 | ] |
2360 | 2360 | }, |
|
2394 | 2394 | "\n", |
2395 | 2395 | " for batch_idx, batch in enumerate(train_loader):\n", |
2396 | 2396 | "\n", |
2397 | | - " optimizer.zero_grad() # Reset loss gradients from previous epoch\n", |
| 2397 | + " optimizer.zero_grad() # Reset loss gradients from previous batch iteration\n", |
2398 | 2398 | "\n", |
2399 | 2399 | " loss, chosen_rewards, rejected_rewards = compute_dpo_loss_batch(\n", |
2400 | 2400 | " batch=batch,\n", |
|
3088 | 3088 | "name": "python", |
3089 | 3089 | "nbconvert_exporter": "python", |
3090 | 3090 | "pygments_lexer": "ipython3", |
3091 | | - "version": "3.11.4" |
| 3091 | + "version": "3.11.9" |
3092 | 3092 | } |
3093 | 3093 | }, |
3094 | 3094 | "nbformat": 4, |
|
0 commit comments