Gradient-based Learning to Optimize Framework for extending tf.keras.optimizers.Optimizer
.
This framework was originally written as the backend for "Optimizer Amalgamation" (ICLR 2022; Tianshu Huang, Tianlong Chen, Sijia Liu, Shiyu Chang, Lisa Amini, Zhangyang Wang).
l2o.deserialize
: utilities used to deserialize json and other argumentsl2o.distutils
: utilities for packaging optimizers for distributionl2o.evaluate
: evaluation methods and optimizee prototypes for evaluationl2o.optimizer
:tf.keras.optimizers.Optimizer
extension back endl2o.policies
: policy descriptionsl2o.problems
: optimizees used in training; dataset managementl2o.strategy
: training strategy (i.e. Curriculum Learning)l2o.train
: truncated backpropagation implementation
Library | Known Working | Known Not Working |
---|---|---|
tensorflow | 2.3.0, 2.4.1 | <= 2.2 |
tensorflow_datasets | 3.1.0, 4.2.0 | n/a |
pandas | 0.24.1, 1.2.4 | n/a |
numpy | 1.18.5, 1.19.2 | >=1.20 |
scipy | 1.4.1, 1.6.2 | n/a |
-
Nested structure issue: tensorflow 2.2 or earlier have a bug in parsing nested structures in
get_concrete_function
. Solution: upgrade tensorflow to >=2.3.0. -
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed
: see issue here; caused by tensorflow dependencygast
version mismatch. Solution:pip install gast==0.3.3
. -
NotImplementedError: Cannot convert a symbolic Tensor (Size:0) to a numpy array.
: see question here; caused bynumpy
API version mismatch. Solution: downgrade numpy to <1.20 (Tested: 1.19.2, 1.18.5) -
GPUs not showing up: make sure the
tensorflow-gpu
conda package is installed, not justtensorflow
.
- Training on multiple GPUs with MirroredStrategy
- Training on multiple devices with MirroredStrategy should work in theory, but is not tested.
- Sparse training
- Training with model split between different GPUs
- Some systems may be up to 2x slower than others, even with identical GPUs, sufficient RAM, and roughly equivalent CPUs. I believe this is due to some kernel launch inefficiency or CUDA/TF configuration problem.
- Sometimes, training will "NaN" out, and turn all optimizer weights to NaN. There is supposed to be a guard preventing NaN gradient updates from being committed, but it doesn't seem to be fully working.