Skip to content

Learning to Optimize meta-learning framework for tensorflow 2.

License

Notifications You must be signed in to change notification settings

thetianshuhuang/l2o

Repository files navigation

Learn To Optimize

Gradient-based Learning to Optimize Framework for extending tf.keras.optimizers.Optimizer.

This framework was originally written as the backend for "Optimizer Amalgamation" (ICLR 2022; Tianshu Huang, Tianlong Chen, Sijia Liu, Shiyu Chang, Lisa Amini, Zhangyang Wang).

Description of Modules

  • l2o.deserialize: utilities used to deserialize json and other arguments
  • l2o.distutils: utilities for packaging optimizers for distribution
  • l2o.evaluate: evaluation methods and optimizee prototypes for evaluation
  • l2o.optimizer: tf.keras.optimizers.Optimizer extension back end
  • l2o.policies: policy descriptions
  • l2o.problems: optimizees used in training; dataset management
  • l2o.strategy: training strategy (i.e. Curriculum Learning)
  • l2o.train: truncated backpropagation implementation

Dependencies

Library Known Working Known Not Working
tensorflow 2.3.0, 2.4.1 <= 2.2
tensorflow_datasets 3.1.0, 4.2.0 n/a
pandas 0.24.1, 1.2.4 n/a
numpy 1.18.5, 1.19.2 >=1.20
scipy 1.4.1, 1.6.2 n/a

Common Errors

  • Nested structure issue: tensorflow 2.2 or earlier have a bug in parsing nested structures in get_concrete_function. Solution: upgrade tensorflow to >=2.3.0.

  • OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: see issue here; caused by tensorflow dependency gast version mismatch. Solution: pip install gast==0.3.3.

  • NotImplementedError: Cannot convert a symbolic Tensor (Size:0) to a numpy array.: see question here; caused by numpy API version mismatch. Solution: downgrade numpy to <1.20 (Tested: 1.19.2, 1.18.5)

  • GPUs not showing up: make sure the tensorflow-gpu conda package is installed, not just tensorflow.

Training Support

Supported

  • Training on multiple GPUs with MirroredStrategy
  • Training on multiple devices with MirroredStrategy should work in theory, but is not tested.

Not Supported

  • Sparse training
  • Training with model split between different GPUs

Known Problems

  • Some systems may be up to 2x slower than others, even with identical GPUs, sufficient RAM, and roughly equivalent CPUs. I believe this is due to some kernel launch inefficiency or CUDA/TF configuration problem.
  • Sometimes, training will "NaN" out, and turn all optimizer weights to NaN. There is supposed to be a guard preventing NaN gradient updates from being committed, but it doesn't seem to be fully working.

About

Learning to Optimize meta-learning framework for tensorflow 2.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages