-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Gradients ruin everything #2580
Comments
To be clear, I think that I think you might be looking for this: model_pars = Flux.trainables(model) # Vector{AbstractArray} containing parameters
opt_state = Flux.setup(Adam(), model_pars) # momenta for Adam in matching form, Vector{<:Optimisers.Leaf}
# then you find this loss_gradients somewhere, at each step
@assert loss_gradients isa Vector{<:AbstractArray}
Flux.update!(opt_state, model_pars, loss_gradients)
# alternatively, with an explicit loop like yours
for (s, p, g) in zip(opt_state, model_pars, loss_gradients)
Flux.update!(s, p, g) # here update! acts on individual arrays
end Here, there are no namedtuples. Is this what you want? Or can you clarify why it doesn't work? This style has worked since about v13.6. The old-style optimisers (with IdDict: |
@mcabbott Thank you very much for taking the time to understand my situation. Your code helped me resolve my issue - now my code works exactly as intended. It's great that Flux still allows explicit use of gradients, which was very important for me. However, the usage still isn't completely clear. It seems that I think now the issue is not with the library itself but with its documentation. The differences between various object types and their relationships aren't clear. Why does Anyway, these questions don't concern me much anymore since I've solved my problem. Thank you very much! |
Dear Flux developers,
I'm not someone who usually leaves feedback or writes comments. However, this is a case where I cannot remain silent.
I want to discuss the changes that occurred after version 0.14.25. Now users are required to use gradients ONLY in the form of a NamedTuple structure. Previously, it was possible to use a weight update function approximately in this form:
In this function, weight updates occur in a loop that directly iterates over the model weights. The gradient was represented as a vector of vectors, whose sizes corresponded to the layer size. This is no longer possible since Flux.Optimise.update! expects a NamedTuple.
I believe this is a major strategic mistake. Now gradients are supposed to be calculated like this:
gradient(m -> loss(m, x, y), model)
. However, this isn't always possible. Sometimes gradients for neural network updates can be obtained in completely different ways. For example, in statistical modeling through accumulating gradients of various other functions at points of interest and subsequent operations. Performing these operations with NamedTuples is unbearably inconvenient. Additionally, this requires rewriting large sections of code and storing gradients in inefficient structures. (I'm willing to bet that a simple array is much faster than NamedTuples). I'm aware of the Flux.destructure function, but this doesn't solve the problem.When you decided to switch to a more complex structure, you lost a lot in terms of universality. After all, the simpler the structure, the easier it is to use in different cases. Unfortunately, the recent changes mean that I can no longer use Flux for my research.
P.S.: Perhaps if I spend enough time, I might even manage to make everything work with the new version. But this is not something I want to spend time on, and this is not what you expect from a framework that you heavily depend on.
TL;DR: Please save old way to work with Gradients. Do not deprecate Flux.params(model).
The text was updated successfully, but these errors were encountered: