Skip to content

Releases: pytorch/rl

0.0.2a

17 Sep 19:49
b3a46c6
Compare
Choose a tag to compare
0.0.2a Pre-release
Pre-release

What's Changed

  • [BugFix] Fixed compose which ignored inv_transforms of child by @nicolas-dufour in #328
  • [BugFix] functorch installation in CircleCI by @vmoens in #336
  • [Refactor] VecNorm inference API by @vmoens in #337
  • TransformedEnv sets added Transforms into eval mode by @alexanderlobov in #331
  • [Refactor] make to_tensordict() create a copy of the content by @nicolas-dufour in #334
  • [CircleCI] Fix dm_control rendering by @vmoens in #339
  • [BugFix]: joining processes when they're done by @vmoens in #311
  • [Test] pass the OS error in case the file isn't closed by @tongbaojia in #344
  • [Feature] Make default rollout tensordict contiguous by @vmoens in #343
  • [BugFix] Clone memmap tensors on regular tensors and other replay buffer improvements by @vmoens in #340
  • [CI] Using latest gym by @vmoens in #346
  • [Doc] Coding your first DDPG tutorial by @vmoens in #345
  • [Doc] Minor: typos in DDPG by @vmoens in #354
  • [Feature] Register lambda and gamma in buffers by @vmoens in #353
  • [Feature] Implement eq for TensorSpec by @omikad in #358
  • [Doc] Multi-tasking tutorial by @vmoens in #352
  • [Feature] Env refactoring for model based RL by @nicolas-dufour in #315
  • [Feature]: Added support for TensorDictSequence module subsampling by @nicolas-dufour in #332
  • [BugFix] Add lock to vec norm transform by @jaschmid-fb in #356
  • [Perf]: Improve PPO training performance by @vmoens in #297
  • [BugFix] Functorch-Tensordict bug fixes by @vmoens in #361
  • Revert "[BugFix] Functorch-Tensordict bug fixes" by @vmoens in #362
  • [BugFix] Functorch-Tensordict bug fixes by @vmoens in #363
  • [Feature] CSVLogger (ABBANDONED) by @vmoens in #371
  • [Feature] Support tensor-based decay in TD-lambda by @tcbegley in #360
  • [Feature] CSVLogger by @vmoens in #372
  • [BugFix] Fewer env instantiations for better mujoco rendering by @vmoens in #378
  • [Feature] change imports of environment libraries (gym and dm_control) at lower levels by @guabao in #379
  • [BugFix] Representation of indexed nested tensordict by @vmoens in #370
  • [BugFix] In-place __setitem__ for SubTensorDict by @vmoens in #369
  • [Feature] Add ProbabilisticTensorDictModule dist key mapping support by @nicolas-dufour in #376
  • [Feature]: R3M integration by @vmoens in #321
  • [Feature] static_seed flag for envs, vectorized envs and collectors by @vmoens in #385
  • [Feature] AdditiveGaussian exploration strategy by @vmoens in #388
  • [Feature] Multi-images R3M by @vmoens in #389
  • [Feature] Flatten multi-images in R3M by @vmoens in #391
  • [Quality] Code cleanup for fbsync by @vmoens in #392
  • [Feature] In-house functional modules for TorchRL using TensorDict by @vmoens in #387
  • [Quality] Code cleanup for fbsync by @vmoens in #397
  • [Doc] Add charts to examples by @nicolas-dufour in #374
  • [Feature] Vectorized GAE by @vmoens in #365
  • [BugFix] Temporarily fix gym to 0.25.1 to fix CI by @vmoens in #411
  • [Feature] Create a Squeeze transform and update Unsqueeze transform by @reachsumit in #408
  • [Naming] Recurse kwarg to match pytorch by @matt-fff in #410
  • [Feature] Add all implemented loggers to the init of loggers by @flinder in #402
  • [BugFix] Fix gym 0.26 compatibility by @vmoens in #403
  • [BugFix] Remove submodules by @vmoens in #414
  • [Feature] lock tensordict when calling share_memory_() by @fdabek1 in #412
  • [BugFix] Updated TensorDict.expand to work as Tensor.expand by @AnshulSehgal in #409
  • [BugFix] Looser check for test_recorder assertion by @vmoens in #415
  • [Feature] Allow spec to be passed directly to exploration wrappers by @vmoens in #418
  • [BugFix] Collector revert to default exploration mode if empty string is passed by @vmoens in #421
  • [Naming] Rename _TargetNetUpdate to TargetNetUpdater, making it public by @yushiyangk in #422
  • [Doc] Re-run tutorials by @vmoens in #381
  • Revert "[Doc] Re-run tutorials" (colab links broken) by @vmoens in #423
  • [Feature] Switch back to latest gym by @vmoens in #425
  • [Feature] TensorDict without device by @tcbegley in #413
  • Updated the README.md file by @bashnick in #427
  • [Feature] Adding support for initialising TensorDicts from nested dicts by @zeenolife in #404
  • [Features] Make image_size a cfg param by @nicolas-dufour in #430
  • Make TensorDict.expand accept Sequence arguments by @nicolasgriffiths in #424
  • [Doc] Readme revamp for efficiency/modularity display by @vmoens in #382
  • [Feature] New biased_softplus semantic to allow for minimum scale setting by @nicolas-dufour in #428
  • [Tutorial] Re-run tutos by @vmoens in #434
  • [BugFix] mixed device_safe vs device by @vmoens in #429
  • [BugFix] Explicit params and buffers by @agrotov in #436
  • [BugFix] Fixed Additive noise by @nicolas-dufour in #441
  • [Tests] Test loggers video saving by @bashnick in #439
  • Revert "[BugFix] Fixed Additive noise" by @vmoens in #442
  • [Refactor] Rename TensorDictSequence to TensorDictSequential by @ronert in #440
  • [Refactor] Refactoring set*() methods for TensorDictBase class by @zeenolife in #438
  • [Cleanup] Removing gym-retro interface by @vmoens in #444
  • [BugFix]: Fix additive noise by @nicolas-dufour in #447
  • [BugFix] CatTensors: Prepended next_ to the out_key by @ggimler3 in #449
  • [BugFix] Fix AdditiveGaussian exploration tests by @vmoens in #450
  • [BugFix] Wrong call to device_safe in replay buffer code by @vmoens in #454
  • [BugFix] Add transform_observation_spec _R3MNet by @ymwdalex in #443
  • [Doc] Add a knowledge base by @shagunsodhani in #375
  • [Feature] Allow for actions and rewards to be in the reset tensordict by @vmoens in #458
  • [Doc] Readme for knowledge base by @vmoens in #459
  • [Feature] Added batch_lock attribute in EnvBase by @nicolas-dufour in #399
  • [BugFix] deepcopy specs before transforming by @vmoens in #461
  • [BugFix]: Fixed dm_control action type casting by @nicolas-dufour in #463
  • [Versioning] Version 0.0.2a0 by @vmoens in #465

New Contributors

Read more

v0.0.1-gamma

25 Jul 21:20
00ddc39
Compare
Choose a tag to compare
v0.0.1-gamma Pre-release
Pre-release

What's Changed

  • Adding additional checks to TensorDict.view to remove unnecessary ViewedTensorDict object creation by @bamaxw in #319
  • [BugFix]: Safe state normalization when std=0 by @vmoens in #323
  • [BugFix]: gradient propagation in advantage estimates by @vmoens in #322
  • [BugFix]: make training example gracefully exit by @vmoens in #326
  • [Setup]: Exclude tutorials from wheels by @vmoens in #325
  • [BugFix]: Tensor map for subtensordict.set_ by @vmoens in #324
  • [Release]: Wheels v0.0.1c by @vmoens in #327

New Contributors

Full Changelog: v0.0.1b...v0.0.1c

v0.0.1-beta

25 Jul 08:45
23ca67c
Compare
Choose a tag to compare
v0.0.1-beta Pre-release
Pre-release

Highlights

Supports nested tensordicts:

  • [Feature] Nested tensordicts by @vmoens in #256
  • [Feature]: Index nested tensordicts using tuples by @vmoens in #262
  • [Feature]: flatten nested tensordicts by @vmoens in #264

Padding for tensordicts:

Speed improvements:

  • [Feature]: faster meta-tensor API for TensorDict by @vmoens in #272
  • [Feature]: faster safetanh transform via C++ bindings by @vmoens in #289
  • [Feature]: Improving training efficiency by @vmoens in #293

Logging capabilities:

Doc

What's Changed

  • MacOs versioning and release bugfix by @vmoens in #247
  • Setup metadata by @vmoens in #248
  • Fix setup instructions by @vmoens in #250
  • Fix a bug when segment_tree size is exactly 2^N by @xiaomengy in #251
  • Added test for RewardRescale transform by @nicolas-dufour in #252
  • Empty TensorDict population in loops by @vmoens in #253
  • Memmap del bugfix by @vmoens in #254
  • [BugFix]: recursion error when calling permute(...).to_tensordict() by @vmoens in #260
  • Differentiable PPOLoss for IRL by @vmoens in #240
  • [BugFix]: avoid deleting true in_keys in TensorDictSequence by @vmoens in #261
  • [Feature] Add issue and pull request template by @Benjamin-eecs in #263
  • [Test]: test nested CompositeSpec by @vmoens in #265
  • [Test]: test squeezed TensorDict by @vmoens in #269
  • [Test]: TensorDict: test tensordict created on cuda and sub-tensordict indexed along 2nd dimension by @vmoens in #268
  • Refactor the torch.stack with destination by @khmigor in #245
  • Small tweaks to make the replay buffer code more consistent by @shagunsodhani in #275
  • [BugFix]: Minor bugs in docstrings by @vmoens in #276
  • [BugFix]: update wrong links in issue and pull request template by @Benjamin-eecs in #286
  • [BugFix]: quickfix: force gym 0.24 installation until issue with rendering is resolved by @vmoens in #283
  • [Doc]: remove pip install from CONTRIBUTING.md by @vmoens in #288
  • [BugFix]: fix GLFW3 error when installing dm_control by @vmoens in #291
  • [BugFix]: Fix examples by @vmoens in #290
  • [Doc] Simplify PR template by @vmoens in #292
  • [BugFix]: Replay buffer bugfixes by @vmoens in #294
  • [Doc] MacOs M1 troubleshooting by @ramonmedel in #296
  • [QuickFix]: update issue and pr template by @Benjamin-eecs in #303
  • [Test] tests for BinarizeReward by @srikanthmg85 in #302
  • [BugFix]: L2-priority for PRB by @vmoens in #305
  • [Feature] Transforms: Compose.insert and TransformedEnv.insert_transform by @rmartimov in #304
  • [BugFix] Fix flaky test by waiting for procs instead of sleep by @nairbv in #306
  • [BugFix] Fix a build warning, setuptools/distutils import order by @nairbv in #307
  • ufmt issue if imports in order requested by distutils by @nairbv in #308
  • [BugFix]: Conda to pip for circleci by @vmoens in #310
  • [BugFix] Support list-based boolean masks for TensorDict by @benoitdescamps in #299
  • [Feature] Truly invertible tensordict permutation of dimensions by @ramonmedel in #295
  • [Feature] Rename _TensorDict into TensorDictBase by @yoavnavon in #316

New Contributors

Full Changelog: v0.0.1...v0.0.1b

v0.0.1-alpha

06 Jul 09:52
ad92dd7
Compare
Choose a tag to compare
v0.0.1-alpha Pre-release
Pre-release

TorchRL Initial Alpha Release

TorchRL is the soon-to-be official RL domain library for PyTorch.
It contains primitives that are aimed at covering most of the modern RL research space.

Getting started with the library

Installation

The library can be installed through

$ pip install torchrl

Currently, torchrl wheels are provided for linux and macos (not M1) machines. For other architectures or for the latest features, refer to the README.md and CONTRIBUTING.md files for advanced installation instructions.

Environments

TorchRL currently supports gym and dm_control out-of-the-box. To create a gym wrapped environment, simply use

from torchrl.envs import GymEnv, GymWrapper
env = GymEnv("Pendulum-v1")
# similarly
env = GymWrapper(gym.make("Pendulum-v1"))

Environment can be transformed using the torchrl.envs.transforms module. See the environment tutorial for more information.
The ParallelEnv allows to run multiple environments in parallel.

Policy and modules

TorchRL modules interacts using TensorDict, a new data carrier class. Although it is not necessary to use it and one can find workarounds for it, we advise to use the TensorDictModule class to read tensordicts:

from torchrl.modules import TensorDictModule
>>> policy_module = nn.Linear(n_obs, n_act)
>>> policy = TensorDictModule(policy_module, 
...   in_keys=["observation"],  # keys to be read for the module input
...   out_keys=["action"],  # keys to be written with the module output
)
>>> tensordict = env.reset()
>>> tensordict = policy(tensordict)
>>> action = tensordict["action"]

By using TensorDict and TensorDictModule, you can make sure that your algorithm is robust to changes in configuration (e.g. usage of an RNN for the policy, exploration strategies etc.) TensorDict instances can be reshaped in several ways, cast to device, updated, shared among processes, stacked, concatenated etc.

Some specialized TensorDictModule are implemented for convenience: Actor, ProbabilisticActor, ValueOperator, ActorCriticOperator, ActorCriticWrapper and QValueActor can be found in actors.py.

Collecting data

DataColllectors is the TorchRL data loading class family. We provide single process, sync and async multiprocess loaders. We also provide ReplayBuffers that can be stored in memory or on disk using the various storage options.

Loss modules and advantage computation

Loss modules are provided for each algorithm class independently. They are accompanied by efficient implementations of value and advantage computation functions.
TorchRL is devoted to be fully compatible with functorch, the functional programming PyTorch library.

Examples

A bunch of examples are provided as well. Check the examples directory to learn more about exploration strategies, loss modules etc.