Releases: google-research/batch-ppo
Releases · google-research/batch-ppo
TensorFlow Agents 1.4.0
Features:
- Split episodes into chunks for training. This reduces memory requirements when training from pixels and in some cases increases data efficiency.
- Use lambda variable initializers everywhere to support embedding the simulation into a larger graph.
- Upgrade to newest Gym version, including new environment names and dtypes for spaces.
- Support regularization losses returned by the network.
Improvements:
- Remove MuJoCo dependency from tests.
- Speed up smoke tests for faster iteration times.
- Enable continuous integration.
Bugs:
- Fix off-by-one bug in
FrameHistory
environment wrapper.
TensorFlow Agents 1.3.0
Features:
- Represent policies as tf.distribution objects, so that the algorithms are independent of the action distribution.
Improvements:
- Move reusable components into
agents.parts
package. - Add nesting tools to handle nested tuples, lists, and dicts.
Bugs:
- Fix PPO not learning on GPU by placing the optimizer on the GPU.
TensorFlow Agents 1.2.0
Features:
- Use single optimizer for PPO to train shared feature layers better.
- Allow calling methods of the process environment.
Improvements:
- Improve default and MuJoCo configs.
- Report both training and evaluation scores.
Bugs:
- Likelihood calculation halved gradients for the action standard deviation.
TensorFlow Agents 1.1.0
Features:
- Policy networks are now defined as functions mapping sequences of observations to sequences of actions. As a result, feed forward policies are faster now, and memory based agents are easier to implement. Previously, networks were restricted to be defined as
RNNCell
s. - All functions of the agent interface receive a tensor of agent indices now. This adds the flexibility to process observations in smaller batches. Previously,
perform()
andexperience()
was defined on data from all the environments.
TensorFlow Agents 1.0.0
Initial release.