Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the option couple to AdamW and set the default to match pytorch #188

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

CarloLucibello
Copy link
Member

See FluxML/Flux.jl#2433 for the details

Close FluxML/Flux.jl#2433

couple::Bool
end

function AdamW(η, β = (0.9, 0.999), λ = 0.0, ϵ = 1e-8; couple::Bool = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function AdamW(η, β = (0.9, 0.999), λ = 0.0, ϵ = 1e-8; couple::Bool = true)
function AdamW(η, β = (0.9, 0.999), λ = 0.0, ϵ = 1e-8; couple::Union{Nothing,Bool} = nothing)

Could we do something like this and then add a warning in the constructor if couple isn't set?

@mcabbott
Copy link
Member

mcabbott commented Nov 6, 2024

I'm happy to change the default but I wonder if this couple keyword is too easy to miss, and too obscure a name. If you read someone's code, would you ever you guess what it means?

One extreme would be to have AdamWpaper and AdamWtorch, then neither gets the neutral name.

@mcabbott mcabbott added this to the 0.4 milestone Nov 6, 2024
@CarloLucibello
Copy link
Member Author

I would leave this PR as it is. 99% of users will write AdamW(lr) and get what pytorch does and will be happy. We want couple to just not appear in most codebases and not create mental burden to most users. It will be available and clearly discoverable through the docstring for the few people aware of the paper/pytorch discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implementation of AdamW differs from PyTorch
3 participants