Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Gradients ruin everything #2580

Open
mposysoev opened this issue Jan 7, 2025 · 2 comments
Open

New Gradients ruin everything #2580

mposysoev opened this issue Jan 7, 2025 · 2 comments

Comments

@mposysoev
Copy link

Dear Flux developers,

I'm not someone who usually leaves feedback or writes comments. However, this is a case where I cannot remain silent.

I want to discuss the changes that occurred after version 0.14.25. Now users are required to use gradients ONLY in the form of a NamedTuple structure. Previously, it was possible to use a weight update function approximately in this form:

function update_model!(model::Chain, optimizer, loss_gradients::Vector{<:AbstractArray{T}}) where {T <: AbstractFloat}
    for (gradient, parameter) in zip(loss_gradients, Flux.params(model))
        Flux.Optimise.update!(optimizer, parameter, gradient)
    end
    return model
end

In this function, weight updates occur in a loop that directly iterates over the model weights. The gradient was represented as a vector of vectors, whose sizes corresponded to the layer size. This is no longer possible since Flux.Optimise.update! expects a NamedTuple.

I believe this is a major strategic mistake. Now gradients are supposed to be calculated like this: gradient(m -> loss(m, x, y), model). However, this isn't always possible. Sometimes gradients for neural network updates can be obtained in completely different ways. For example, in statistical modeling through accumulating gradients of various other functions at points of interest and subsequent operations. Performing these operations with NamedTuples is unbearably inconvenient. Additionally, this requires rewriting large sections of code and storing gradients in inefficient structures. (I'm willing to bet that a simple array is much faster than NamedTuples). I'm aware of the Flux.destructure function, but this doesn't solve the problem.

When you decided to switch to a more complex structure, you lost a lot in terms of universality. After all, the simpler the structure, the easier it is to use in different cases. Unfortunately, the recent changes mean that I can no longer use Flux for my research.

P.S.: Perhaps if I spend enough time, I might even manage to make everything work with the new version. But this is not something I want to spend time on, and this is not what you expect from a framework that you heavily depend on.


TL;DR: Please save old way to work with Gradients. Do not deprecate Flux.params(model).

@mcabbott
Copy link
Member

mcabbott commented Jan 7, 2025

To be clear, I think that parameter and gradient in the above are arrays. You have obtained a vector of gradient arrays from somewhere unspecified, which is the same shape as collect(Flux.params(model)).

I think you might be looking for this:

model_pars = Flux.trainables(model)  # Vector{AbstractArray} containing parameters
opt_state = Flux.setup(Adam(), model_pars)  # momenta for Adam in matching form, Vector{<:Optimisers.Leaf}

# then you find this loss_gradients somewhere, at each step 
@assert loss_gradients isa Vector{<:AbstractArray}
Flux.update!(opt_state, model_pars, loss_gradients)

# alternatively, with an explicit loop like yours
for (s, p, g) in zip(opt_state, model_pars, loss_gradients)
    Flux.update!(s, p, g)  # here update! acts on individual arrays
end

Here, there are no namedtuples. Flux.update! is happy to work on individual parameter arrays, or on arrays of arrays.

Is this what you want? Or can you clarify why it doesn't work?

This style has worked since about v13.6. The old-style optimisers (with IdDict: Params, Grads) which have always lived in Flux.Optimise still live there, but have been decoupled a bit... it used to be true that Flux.Optimise.update! === Flux.update! but these are now two independent functions nope, they are still the same. So I don't know why the above should have stopped working, we can try to debug a MWE? But I do think that writing things in terms of Flux.trainables and explicit opt_state is a better idea.

@mposysoev
Copy link
Author

@mcabbott Thank you very much for taking the time to understand my situation. Your code helped me resolve my issue - now my code works exactly as intended. It's great that Flux still allows explicit use of gradients, which was very important for me.

However, the usage still isn't completely clear. It seems that opt_state needs to somehow connect with the model's parameters to update the correct parameters based on the given gradient. Previously, I simply passed the optimizer to the weight update function, and everything worked. Now it stopped working. I managed to fix it by replacing Flux.Optimise.update! with Flux.update!, and of course, now opt_state is used instead of opt.

I think now the issue is not with the library itself but with its documentation. The differences between various object types and their relationships aren't clear. Why does Flux.Optimise.update! behave differently from Flux.update!? Why are there multiple functions in the library that do almost the same thing, and how do I know which one I specifically need?

Anyway, these questions don't concern me much anymore since I've solved my problem. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants