Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L-BFGS algorithm request #195

Open
paulxshen opened this issue Dec 9, 2024 · 2 comments
Open

L-BFGS algorithm request #195

paulxshen opened this issue Dec 9, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@paulxshen
Copy link

Motivation and description

Can we implement L-BFGS? It's a quasi 2nd order method that can converge much faster, suitable for computationally intensive models with moderate number of parameters. I work in inverse design and topology optimization with differentiable simulation. L-BFGS is the go-to method here.

https://github.com/baggepinnen/FluxOptTools.jl has a partial implementation but it'd be nice to have it natively within FluxML

Thanks!

Possible Implementation

No response

@mcabbott
Copy link
Member

One implementation which makes few assumptions about the data/gradient format is https://github.com/Jutho/OptimKit.jl

However, the mismatch is that it wants to control when to call the function/model, while in Optimisers.jl you call it & the package just handles the update. That's true of all the L-BFGS implementations I know of. Am not an expert but I think this is largely to allow for linesearch, and it will typically call f(x) several times per accepted update of x? However, it seems OptimKit.jl's interface has no way to call f(x) rather than withgradient(f, x):

(objective function) is specified as a function fval, gval = fg(x) that returns both the function value and its gradient at a given point x. The function value fval is assumed to be a real number of some type T<:Real. Both x and the gradient gval can be of any type, including tuples and named tuples.

I guess that's not impossible within the current interface... sometimes update!(state, model, grad) will be a linesearch step. Will it be a problem to sometimes stop not when e.g. OptimKit.optimize thinks you should, but just after 1000 calls? Will it be a problem that each call is typically on a different minibatch of data? That's not obligatory with this package but it is typical.

@paulxshen
Copy link
Author

paulxshen commented Dec 15, 2024

Thanks I found updated implementation in Optim.jl https://github.com/JuliaNLSolvers/Optim.jl/blob/master/src/multivariate/solvers/first_order/l_bfgs.jl
Doesn't look too hard to port to Optimisers.jl? We can omit linesearch. Flux assumes each function evaluation is expensive.

@mcabbott mcabbott added the enhancement New feature or request label Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants