You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can we implement L-BFGS? It's a quasi 2nd order method that can converge much faster, suitable for computationally intensive models with moderate number of parameters. I work in inverse design and topology optimization with differentiable simulation. L-BFGS is the go-to method here.
However, the mismatch is that it wants to control when to call the function/model, while in Optimisers.jl you call it & the package just handles the update. That's true of all the L-BFGS implementations I know of. Am not an expert but I think this is largely to allow for linesearch, and it will typically call f(x) several times per accepted update of x? However, it seems OptimKit.jl's interface has no way to call f(x) rather than withgradient(f, x):
(objective function) is specified as a function fval, gval = fg(x) that returns both the function value and its gradient at a given point x. The function value fval is assumed to be a real number of some type T<:Real. Both x and the gradient gval can be of any type, including tuples and named tuples.
I guess that's not impossible within the current interface... sometimes update!(state, model, grad) will be a linesearch step. Will it be a problem to sometimes stop not when e.g. OptimKit.optimize thinks you should, but just after 1000 calls? Will it be a problem that each call is typically on a different minibatch of data? That's not obligatory with this package but it is typical.
Motivation and description
Can we implement L-BFGS? It's a quasi 2nd order method that can converge much faster, suitable for computationally intensive models with moderate number of parameters. I work in inverse design and topology optimization with differentiable simulation. L-BFGS is the go-to method here.
https://github.com/baggepinnen/FluxOptTools.jl has a partial implementation but it'd be nice to have it natively within FluxML
Thanks!
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: