Use Caching Allocator in train!

The implementation of  `Flux.Train.train!` could also use the Caching Allocator interface from GPUArrays.jl:
https://juliagpu.github.io/GPUArrays.jl/dev/interface/#Caching-Allocator

This way, we should be able to fix https://github.com/FluxML/Flux.jl/issues/2523 and related issues