Skip to content

v0.8.0: Async envs and better weight update API

Compare
Choose a tag to compare
@vmoens vmoens released this 30 Apr 14:58
· 86 commits to main since this release

TorchRL v0.8.0: Async envs and better weight update API

  • Async environments: #2864 introduces asynchronous environments, which can be built using different backends (currently
    "threading" or "multiprocessing"). Instantiating an async env is roughly the same as a parallel one:
    from torchrl.envs import AsyncEnvPool
    env = AsyncEnvPool([partial(GymEnv, "Pendulum-v1"), partial(GymEnv, "Pendulum-v1")], backend="threading")
    These environments support the regular environment methods (reset, step or rollout) but their main advantage lies
    in their new async methods:
    s0 = env.rand_action(env.reset())
    env.async_step_send(s0)
    # receive
    result = env.async_step_recv()
    In this example, result will contain the results of the call to step for one or two environments. The environment indices
    can be found in the result['env_index'] entry (the name of that key is stored in env._env_idx_key).
  • Support for environments with tensorclass attributes (#2788)
  • Distributed RayReplayBuffer (#2835)
  • Gymnasium 1.1 compatibility (#2898): we managed to make TorchRL compatible with Gymnasium 1.1 as this version lets
    users choose how to handle partial resets, which facilitates integration in the library.
  • VecNormV2, a new version of vecnorm which is more numerically stable and easier to handle. This can be created directly
    through the usual VecNorm by passing the new_api keyword argument.
  • policy factory for collectors: you can now pass a factory for your policy instead of passing the real object.
    Given that the collector will update the weights of the policy when asked to, this will in most cases not cause any
    synchronization problem with the copy that is used by the training pipeline.
  • An Update API for policy weights in collector: we have isolated the weight update API in a torchrl.collectors.WeightUpdaterBase
    abstract class. This should the entry point for any user wanting to implement their own weight update strategy, alleviating
    the need to subclass or patch the collector or the policy directly.

Packaging

We relaxed TorchRL dependency to make it compatible with any pytorch version. The current status is:

  • tensordict dependency will from now on be enforced (>=0.8.1,<0.9.0 for this release)
  • For PyTorch prior to 2.7.0, backward compatibility is guaranteed to some extend (most classes should work, unless new features are used) but C++ binaries (for prioritized replay buffers) will not work.
  • For PyTorch >= 2.7.0, C++ binaries should work across versions. In other words, torchrl binaries for 0.8.0 will work with PyTorch 2.7.0, 2.8.0 etc., and the same goes for the future TorchRL 0.9.0... A big thanks to @janeyx99 for enabling this!

New features

[Feature] Add EnvBase.all_actions (#2780) (67c3e9a) by @kurtamohler ghstack-source-id: 7abf9d469f740be5f14daffa2330811f7572dad9
[Feature] Add MCTSForest/Tree.to_string (#2794) (f862669) by @kurtamohler ghstack-source-id: 2127bf24d66e44fb310d12ff5f72e92aa0371cd7
[Feature] Add include_hash_inv arg to ChessEnv (#2766) (3be85c6) by @kurtamohler ghstack-source-id: f6920d781835902a6db02f74c5e5a3041243c5e3
[Feature] Add option for auto-resetting envs in GAE (#2851) (f5f3ae4) by @lin-erica Co-authored-by: Erica Lin [email protected]
[Feature] Async environments (#2864) (4f00025) by @vmoens ghstack-source-id: 0a70ce0129d2ee6f85bb22adda3c332ff65e7501
[Feature] Capture wrong spec transforms (1/N) (#2805) (d3dca73) by @vmoens ghstack-source-id: f2d938b3dfe88af66622099f60cd7e3026289a02
[Feature] Collectors for async envs (#2893) (4ba5066) by @vmoens ghstack-source-id: 764c21d0f2c3b217440e1a6f12ee797b17820c1d
[Feature] DensifyReward postproc (#2823) (53065cf) by @vmoens ghstack-source-id: ef6a0f52601642c8944f63f9e3ac9e963425734e
[Feature] Dynamic specs for make_composite_from_td (#2829) (413571b) by @vmoens ghstack-source-id: 79e31e737c9f67ff20ce9fe32081e5b0a83de947
[Feature] Enable Hash.inv (#2757) (32c4623) by @kurtamohler ghstack-source-id: 956708121067855e519382a37764f06f53b16aa7
[Feature] Env with tensorclass attributes (#2788) (ab76027) by @vmoens ghstack-source-id: dc00ea3d23e015756974cd5c2ce638b55e5f6f92
[Feature] Gymnasium 1.1 compatibility (#2898) (78cd755) by @vmoens ghstack-source-id: e0891867f4318380f01c15449f9f26070b78536d
[Feature] History API (#2890) (fd10fe2) by @vmoens ghstack-source-id: 5b9723f6e1c327625e1a9be6f6eac68b91ed8492
[Feature] History.default_spec (#2894) (8ce11a8) by @vmoens ghstack-source-id: 40b8a492765a85adaccb591f1bc173754bacc313
[Feature] Local and Remote WeightUpdaters (#2848) (27d3680) by @vmoens ghstack-source-id: 2962530f87b596d038e3a13a934ea09064af2964
[Feature] Make PPO ready for text-based data (#2857) (595ddb4) by @vmoens ghstack-source-id: eeda5e2355e573e74cf7c080994cd47520ecd45b
[Feature] MultiAction transform (#2779) (621776a) by @vmoens ghstack-source-id: 0a6f7f916ee6f9c6d450c511385bdfdb1d911da0
[Feature] NonTensor batched arg (#2816) (b97bdb5) by @vmoens ghstack-source-id: c6de1bd1f1475b8d02df2ff3eb7438a50f2ae450
[Feature] Pass lists of policy_factory (#2888) (82f8ec2) by @vmoens ghstack-source-id: e42b100096c6e38365f8a80681473746f51d8a77
[Feature] RayReplayBuffer (#2835) (50af984) by @vmoens ghstack-source-id: 32eff06494037a1a30e532539794035c035f1e81
[Feature] Set padded token log-prob to 0.0 (#2856) (b9ddfa9) by @vmoens ghstack-source-id: 2b2993e0b15afae17326e6583390d57068712d4f
[Feature] Support lazy tensordict inputs in ppo loss (#2883) (c9caf3d) by @vmoens ghstack-source-id: 89098ba3ca61b1524aeddc68f54c377f29c8dc8b
[Feature] TensorDictPrimer with single default_value callable (#2732) (59e8545) by @vmoens ghstack-source-id: a9a677f24fc1e6a47312d0a96ab60daae543ff78
[Feature] Timer transform (#2806) (104b880) by @vmoens ghstack-source-id: e42f2aece15f90afc457e1fb3e41a1f7be1a6a85
[Feature] Transform for partial steps (#2777) (7c034e3) by @vmoens ghstack-source-id: 587f91e33dfe1d59b73c4b2f2f1c21760ee79d2e
[Feature] VecNormV2 (#2867) (40fcdb6) by @vmoens ghstack-source-id: 639d07ff54be200d54621c2c4619ebd0d3d7d79e
[Feature] VecNormV2: Usage with batched envs (#2901) (b08e7ac) by @vmoens ghstack-source-id: 5e14ed982b71b0e5192b0687c5259a3b49a81157
[Feature] pass policy-factory in mp data collectors (#2859) (31af2c5) by @vmoens ghstack-source-id: bce8abe9853d5ec187f91ffbcd8b940fa18ec8ab
[Feature] policy factory for collectors (#2841) (49a8a42) by @vmoens ghstack-source-id: 96b928e938b8b07fc7de23483358202737571f8e
[Feature] reset_time in Timer (#2807) (5a46379) by @vmoens ghstack-source-id: 36a74fd20b78e1cdde6bca19b4f95c3d9062d761
[Feature] transformers policy (#2825) (eea932c) by @vmoens ghstack-source-id: 870c221b4ebae132a44944f0be0ee78da540d115

Fixes

[BugFix] Apply inverse transform to input of TransformedEnv._reset (#2787) (1ed5d29) by @kurtamohler ghstack-source-id: 5f7c1fbd19b716f2b1602c34cf2ae1362f7bc7f6
[BugFix] Avoid calling reset during env init (#2770) (09e93c1) by @vmoens ghstack-source-id: 5ab8281c34aacfd7dbbfc0e285d88bcae0aededf
[BugFix] Ensure that Composite.set returns self as TensorDict does (#2784) (e084c02) by @vmoens ghstack-source-id: 23fe46b61dc2c9548fd9de7e4100431918fd0370
[BugFix] Fix .item() warning on tensors that require grad (#2885) (b66fcd4) by @vmoens ghstack-source-id: 502bdda3f5700dc900cf5c748839c965b1d67c1b
[BugFix] Fix KL penalty (#2908) (96c3003) by @vmoens ghstack-source-id: 475dccb0bcddbfe3bd2d826c5389834fb95e1ab8
[BugFix] Fix MultiAction reset (#2789) (76aa9bc) by @kurtamohler ghstack-source-id: a2f7bfdd7522a214430182dac65687a977b1a10d
[BugFix] Fix PEnv device copies (#2840) (6e40548) by @vmoens ghstack-source-id: df39fd2e4cd72f24c645b0ac32b46ab3e8d847fc
[BugFix] Fix batch_locked check in check_env_specs + error message callable (#2817) (9c98b82) by @vmoens ghstack-source-id: c722b164133c27c05dd21add3e7f3158189dd515
[BugFix] Fix calls to _reset_env_preprocess (#2798) (ea76ffb) by @vmoens ghstack-source-id: 59925635a87b196a5bcb0fb251afe4cc7b8b103e
[BugFix] Fix collector timeouts (#2774) (f6084b6) by @vmoens ghstack-source-id: cb71d95143beb22db1fe1752e72f70c19f43be79
[BugFix] Fix collector with no buffers and devices (#2809) (d4f8846) by @vmoens ghstack-source-id: 5367df9fcfdf549108be852476b049a0b978e348
[BugFix] Fix compile compatibility of PPO losses (#2889) (9bc85f4) by @vmoens ghstack-source-id: b346033641e5d27560fbfa011a006446e56a4e31
[BugFix] Fix composite setitem (#2778) (c2a149d) by @vmoens ghstack-source-id: f33b49beb4cf8c0c8b156559b1abbee8ac77db20
[BugFix] Fix env.full_done_specs (#2815) (f5c0666) by @vmoens ghstack-source-id: ba0d371d10b3f46ec1172fbec639ccc4d5559659
[BugFix] Fix forced batch-size in _skip_tensordict (#2808) (3acf491) by @vmoens ghstack-source-id: dac84e8b8835e870bce1772d7893c30b6f9af59c
[BugFix] Fix gc import (#2862) (a183f02) by @vmoens ghstack-source-id: b732d4f805d98ceaaa45326d619fce623c10482f
[BugFix] Fix lazy-stack in RBs (#2880) (e80732e) by @vmoens ghstack-source-id: 38399ee991bc065445f4eb1c84b71e7d844d794c
[BugFix] Fix property getter in RayReplayBuffer (#2869) (04d70c1) by @vmoens
[BugFix] Fix slow and flaky non-tensor parallel env test (#2926) by @vmoens ghstack-source-id: fcb5caa56e05176958b3468a7d6f69e363cfe558
[BugFix] Fix update shape mismatch in _skip_tensordict (#2792) (3e42e7a) by @vmoens ghstack-source-id: 27e7d444c126e48fdb70d951a0cc7beaee1db3a8
[BugFix] Fixed VideoRecorder crash when passing fps (#2827) (5ec9bc5) by Alexandre Brown
[BugFix] GAE warning when gamma/lmbda are tensors (#2838) (d561115) by @louisfaury Co-authored-by: Louis Faury [email protected]
[BugFix] Keep original class in LazyStackStorage through lazy_stack (#2873) (70f5c06) by @vmoens ghstack-source-id: 661cd65c86648ffb2ee6ead40110ac3d57477514
[BugFix] NonTensor should not convert anything to numpy (#2771) (3da2750) by @vmoens ghstack-source-id: 7644f6c695490f34d6455703418c59cfa718a9f0
[BugFix] PPOs with composite distribution (#2791) (edfa25d) by @louisfaury Co-authored-by: Louis Faury [email protected]
[BugFix] Refactor _skip_tensordict to avoid update calls (#2802)" (#2802) (e0d3eee) by @vmoens ghstack-source-id: 0f31b879f1e4643080530db8f7c7091e281b560f
[BugFix] Remove neg dim checks in expand for all specs (#2906) (c5afe3c) by @vmoens ghstack-source-id: f718328527275e0be591c5d12c334add8f65f7a4
[BugFix] Right log-prob size in transformer wrapper (#2854) (f81deac) by @vmoens ghstack-source-id: 98baa635ca07d5bf7e69a9e3bc43012ae2d91bf0
[BugFix] Test and fix life cycle of env with dynamic non-tensor spec (#2812) (b538c66) by @vmoens ghstack-source-id: 77da3a6baf0cb42525dd3a564b36ac03a531d17a
[BugFix] Tree make node fix (#2839) (ba8be9c) by Rolo
[BugFix] Use brackets to get non-tensor data in gym envs (#2769) (84f6b04) by @vmoens ghstack-source-id: 3101141eb5b7435c7a4047f5ee84b66c1d74af13
[BugFix] correct dim for resolving dtype in _split_and_pad_sequence (#2801) (21c4d87) by KubaMichalczyk Co-authored-by: Jakub Michalczyk [email protected]

Refactors

[Refactor] Avoid padding in transformer wrapper (#2881) (9c4c086) by @vmoens ghstack-source-id: de28bab17fc3d59889ea9f2fd152de5001b92320
[Refactor] Fix repeats order (#2887) (93ba865) by @vmoens ghstack-source-id: 0bedd5c756f92d23083905ae8a6ddd992ba0b415
[Refactor] MaskedCategorical cross_entropy usage for faster loss (#2882) (3e1f4ff) by @vmoens ghstack-source-id: 84330cf08ad8798e2cd4f6a8f3ec146a9de8e1e4
[Refactor] Refactor the weight update logic (#2914) (0da9044) by @vmoens ghstack-source-id: 72b710ab1788090364c068c59b28a21e09221236
[Refactor] Rename weight updaters (#2892) (efe9389) by @vmoens ghstack-source-id: 8889046277b94db0076fa72787295fd9419ab183
[Refactor] TransformersWrapper class (#2871) (5d72561) by @vmoens ghstack-source-id: 8d5442611e9f1cf499cd59ed3e61a0602459c94d
[Refactor] VecNormV2: update before norm, bias_correction at the right time (#2900) (c3310b8) by @vmoens ghstack-source-id: a90aeb268a83dc2e45735f7b6b19b4e63e572ba7

Miscellaneous

[BE] Ensure abstractmethods are implemented for specs (#2790) (bd78913) by @vmoens ghstack-source-id: 7b943aa84bc497e7e8195f633cb15105de137f04
[BE] Fix some typos (#2811) (0ae1405) by @antoinebrl
[BE] Make better logits in cost tests (#2775) (42ed42c) by @vmoens ghstack-source-id: be9ea92b3f3d2592e426eaeaff7b81e50472cf16
[BE] Remove deprec specs from tests (#2767) (27a8ecc) by @vmoens ghstack-source-id: 717bb31b1773c5c8b180c456f1bbad8a022dc55a
[BE] _set_seed returns None + type annotations (#2903) (3a9f244) by @antoinebrl
[CI] Fix envnames in SOTA tests (#2921) by @vmoens ghstack-source-id: 3b518e2a81e9d988db2fbd12883eabbe486d32db
[CI] Fix libs workflows (#2800) (8dd1be7) by @vmoens
[CI] Fix nightly and benchmark CIs (#2930) by @vmoens ghstack-source-id: f39b2573ba58e7808389af3782aed8809759fa2b
[CI] Fix old deps (#2916) (4162db6) by @vmoens ghstack-source-id: 109de71c622760679d449d906b0f33b3f1866975
[CI] Fix wheels (#2876) (e1d3fd4) by @vmoens ghstack-source-id: 0f2602146c4371d2fc6ac33f139b1eebb0829559
[CI] Upgrade to cuda 12.8 (#2820) (8c9dc05) by @vmoens ghstack-source-id: e0ad7d6c00d53b74b23022836535c453a37df238
[CI] egl for all (#2915) (425952b) by @vmoens ghstack-source-id: 1b5e13c44f5dff1a55f9c78a174f7164f37d76a1
[Deprecation] Enact deprecations (#2917) (b247526) by @vmoens ghstack-source-id: 690a9f62e274e9f14a89532dd7d07176188560e9
[Deprecation] Softly change default behavior of auto_unwrap (#2793) (2046bc5) by @vmoens ghstack-source-id: c28c11ecf68fba0ffde652205ea8e46f8da07cf1
[Doc] Add docstring for MCTSForest.extend (#2795) (a3a1ebe) by @kurtamohler ghstack-source-id: 7fa8834376a1afd9187d7f1d43a97f70d713a160
[Doc] Better doc for Transform class (#2797) (dd59290) by @vmoens ghstack-source-id: 16e563bc810586d31772b58f9923439b632985c7
[Doc] Fix (and deactivate) tutorials (#2785) (f1c42e0) by @vmoens ghstack-source-id: 56c7757c36a2d609688ce0777a49d54763d3e691
[Doc] Fix Doc (#2919) by @vmoens ghstack-source-id: dacd1e7467994c73b22b5f111ac6c486d43d7b58
[Doc] Fix EnvCreator's doc (#2868) (586a541) by @louisfaury Co-authored-by: Louis Faury [email protected]
[Doc] Fix doc CI (#2932) by @vmoens ghstack-source-id: 5a27f9153c77d659ee691117c6af60a2d4b022bf
[Doc] Fix formatting errors (#2786) (03d6586) by @vmoens ghstack-source-id: ac1f3da66c1374d3d19fed88e80f8ed5407b3459
[Doc] Fix tutorials (#2768) (75f113f) by @vmoens
[Doc] Fix tutos (#2772) (b27ee6d) by @vmoens
[Doc] Solve ref issues in docstrings (#2776) (f5445a4) by @vmoens ghstack-source-id: 09823fa85a94115291e7434478776fb0834f9b39
[Doc] Update discrete.py (#2850) (619fec6) by oswald
[Docs] Fix doc setup (#2922) by @vmoens ghstack-source-id: 91d359ad591a7f8062191825d04ebda112f2cf7d
[Environment] Fix lib CI failures (#2923) by @vmoens ghstack-source-id: c046f5ded86a3a07d66eaddcaef24b69c2d77c01
[Environment] Fix lib CI failures (#2929) by @vmoens ghstack-source-id: febe20b0915025cc253f6aa0404258f3a020e1e6
[Lint] pyupgrade (#2819) (40b147e) by @vmoens ghstack-source-id: dcdf51db31b8f6bcfad7fd4dc53f5b5ad8098c5d
[Minor] Quick edits to .md files (#2931) by @vmoens ghstack-source-id: d0fbd3da72d41e6cdbfc4761990cb939993eb816
[Performance] Memoize calls to encode and related methods within step (#2907) (0475cbf) by @vmoens ghstack-source-id: 8acd4839d4ba5f45373d0a0fcb52b15c149d37f1
[Performance] Use TensorDict._new_unsafe in step (#2905) (e5cba04) by @vmoens ghstack-source-id: 8a117fb7f9c5b24173408d217f59ec23da7db33c
[Quality] Better device checks (#2909) (382430d) by @vmoens ghstack-source-id: 7174415de2b4221c6c5fca4a31525ed26bc8d6f9
[Quality] Limit warning filter to torchrl (#2762) (85d1e70) by @antoinebrl
[Quality] Remove redundant return (#2925) (21ef725) by @b10902118
[Setup] Fix no_python_abi_suffix error (#2863) (7df8317) by @vmoens ghstack-source-id: 55c845efd936558116f8fdc356f22aca88943f99
[Setup] Remove distutils imports (#2836) (4c55b65) by @antoinebrl
[Test] Capture deprec warnings (#2799) (fb641de) by @vmoens ghstack-source-id: bcbf41c245c979d0f21524889ad2be8ef4c10c40
[Test] Fix warnings in tests (#2886) (6f634c6) by @vmoens ghstack-source-id: d4ed75d4dae2f0d62adff567d5dcc5fd2f98ce3a
[Versioning] Bump 0.8.0 (#2920) by @vmoens ghstack-source-id: 74b3ef75b2b911c097c1d985068958b697a00134

A big thanks to the community supporting this project!
There would be no TorchRL if it wasn't for its users.