Add a suggestion using the One Cycle Policy with Gradient clipping. #261

Shuyib · 2024-07-06T15:23:12Z

I have added a section in the notebook that goes through the One cycle Policy using the PyTorch method with gradient clipping. I have also added a short description of the method which was edited by Claude 3.5 sonnet. It may not be the best but I welcome corrections, and I hope this method will be part of the appendix.

Note: I did not run the code in the cell on this pull request.

review-notebook-app · 2024-07-06T15:23:17Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

rasbt · 2024-07-06T16:13:33Z

Thanks for the PR, but I currently can't accept additions to the chapters and the Appendix because they have already been layouted by the publisher and it would be confusing for readers if the code in the notebook would be different than the code in the chapter. Thanks for contributing though!

Btw in the code you mention one-cycle policy, but the code in the appendix already implements this. So, may I ask how it's different and why it's needed?

Shuyib · 2024-07-07T08:30:00Z

Thank you for your response to the pull request. I greatly appreciate it. The techniques indeed share similarities; however, I believe the core distinction lies in the balance between exploration and exploitation. This is exemplified by the value of the learning rate, which initially increases from a low value to a high value, then subsequently decreases.

rasbt · 2024-07-10T21:14:49Z

Oh I see, I at first I didn't see the difference because there wasn't a plot and assumed by the title that it was similar but I think I understand the difference now. It's basically a full cycle whereas in the Appendix I have a half-cycle.

To be honest, I do prefer the current half-cycle implementation because that's how it is commonly done in LLM research (e.g., see https://arxiv.org/pdf/2403.08763). I haven't seen any LLM trained with a one-cycle (as opposed to half-cycle) policy, so I'd be a bit hesitant recommending that. Thanks for the PR and the discussion though!

Add a suggestion using the One Cycle Policy with Gradient clipping.

3985b55

Shuyib closed this Jul 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a suggestion using the One Cycle Policy with Gradient clipping. #261

Add a suggestion using the One Cycle Policy with Gradient clipping. #261

Uh oh!

Shuyib commented Jul 6, 2024

Uh oh!

review-notebook-app bot commented Jul 6, 2024

Uh oh!

rasbt commented Jul 6, 2024

Uh oh!

Shuyib commented Jul 7, 2024 •

edited

Loading

Uh oh!

rasbt commented Jul 10, 2024

Uh oh!

Uh oh!

Add a suggestion using the One Cycle Policy with Gradient clipping. #261

Add a suggestion using the One Cycle Policy with Gradient clipping. #261

Uh oh!

Conversation

Shuyib commented Jul 6, 2024

Uh oh!

review-notebook-app bot commented Jul 6, 2024

Uh oh!

rasbt commented Jul 6, 2024

Uh oh!

Shuyib commented Jul 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rasbt commented Jul 10, 2024

Uh oh!

Uh oh!

Shuyib commented Jul 7, 2024 •

edited

Loading