The implementation of `Flux.Train.train!` could also use the Caching Allocator interface from GPUArrays.jl: https://juliagpu.github.io/GPUArrays.jl/dev/interface/#Caching-Allocator This way, we should be able to fix https://github.com/FluxML/Flux.jl/issues/2523 and related issues