Skip to content

Releases: opendilab/DI-engine

v0.5.3

23 Dec 08:06
Compare
Choose a tag to compare

API Change

  1. Expand the Python version support for DI-engine to Python3.7-Python3.10

Env

  1. add pistonball MARL env and its unittest/example (#833)
  2. update trading env (#831)
  3. update ppo config for better discrete action space performance (#809)
  4. remove unused config fields in MuJoCo PPO

Algorithm

  1. add AWR algorithm (#828)
  2. add encoder in MAVAC (#823)
  3. add HPT model architecture (#841)
  4. fix multiple model wrappers reset bug (#846)
  5. add hybrid action space support to ActionNoiseWrapper (#829)
  6. fix mappo adv compute bug (#812)

Enhancement

  1. add resume_training option to allow the envstep and train_iter resume seamlessly (#835)
  2. polish old/new pipeline DistributedDataParallel (DDP) implementation (#842)
  3. adapt DingEnvWrapper to gymnasium (#817)

Fix

  1. fix priority buffer delete bug (#844)
  2. fix middleware collector env reset bug (#845)
  3. fix many unittest bugs

Style

  1. downgrade pyecharts log level to warning and polish installation doc (#838)
  2. polish necessary requirements
  3. polish api doc details
  4. polish DI-engine citation authors
  5. upgrade CI macos version from 12 to 13

News

  1. CleanS2S: High-quality and streaming Speech-to-Speech interactive agent in a single file.
  2. GenerativeRL: Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective
  3. PRG: Pretrained Reversible Generation as Unsupervised Visual Representation Learning

Full Changelog: v0.5.2...v0.5.3

Contributors: @PaParaZz1 @puyuan1996 @kxzxvbk @YinminZhang @zjowowen @luodi-7 @MarkHolmstrom @TairanMK

v0.5.2

27 Jun 08:56
Compare
Choose a tag to compare

Env

  1. add taxi env (#799) (#807)
  2. add ising model env (#782)
  3. add new Flozen Lake env (#781)
  4. optimize ppo continuous config in MuJoCo (#801)
  5. fix masac smac config multi_agent=True bug (#791)
  6. update/speed up pendulum ppo

Algorithm

  1. fix gtrxl compatibility bug (#796)
  2. fix complex obs demo for ppo pipeline (#786)
  3. add naive PWIL demo
  4. fix marl nstep td compatibility bug

Enhancement

  1. add GPU utils (#788)
  2. add deprecated function decorator (#778)

Style

  1. relax flask requirement (#811)
  2. add new badge (hellogithub) in readme (#805)
  3. update discord link and badge in readme (#795)
  4. fix typo in config.py (#776)
  5. polish rl_utils api docs
  6. add constraint about numpy<2
  7. polish macos platform test version to 12
  8. polish ci python version

News

  1. PsyDI: Towards a Multi-Modal and Interactive Chatbot for Psychological Assessments
  2. ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
  3. UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Full Changelog: v0.5.1...v0.5.2

Contributors: @PaParaZz1 @zjowowen @YinminZhang @TuTuHuss @nighood @ruiheng123 @rongkunxue @ooooo-create @eltociear

v0.5.1

04 Feb 15:55
Compare
Choose a tag to compare

Env

  1. add MADDPG pettingzoo example (#774)
  2. polish NGU Atari configs (#767)
  3. fix bug in cliffwalking env (#759)
  4. add PettingZoo replay video demo
  5. change default max retry in env manager from 5 to 1

Algorithm

  1. add QGPO diffusion-model related algorithm (#757)
  2. add HAPPO multi-agent algorithm (#717)
  3. add DreamerV3 + MiniGrid adaption (#725)
  4. fix hppo entropy_weight to avoid nan error in log_prob (#761)
  5. fix structured action bug (#760)
  6. polish Decision Transformer entry (#754)
  7. fix EDAC policy/model bug

Fix

  1. fix env typos
  2. fix pynng requirements bug
  3. fix communication module unittest bug

Style

  1. polish policy API doc (#762) (#764) (#768)
  2. add agent API doc (#758)
  3. polish torch_utils/utils API doc (#745) (#747) (#752) (#755) (#763)

News

  1. AAAI 2024: SO2: A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
  2. LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Full Changelog: v0.5.0...v0.5.1

Contributors: @PaParaZz1 @zjowowen @nighood @kxzxvbk @puyuan1996 @Cloud-Pku @AltmanD @HarryXuancy

v0.5.0

05 Dec 05:04
Compare
Choose a tag to compare

Env

  1. add tabmwp env (#667)
  2. polish anytrading env issues (#731)

Algorithm

  1. add PromptPG algorithm (#667)
  2. add Plan Diffuser algorithm (#700) (#749)
  3. add new pipeline implementation of IMPALA algorithm (#713)
  4. add dropout layers to DQN-style algorithms (#712)

Enhancement

  1. add new pipeline agent for sac/ddpg/a2c/ppo and Hugging Face support (#637) (#730) (#737)
  2. add more unittest cases for model (#728)
  3. add collector logging in new pipeline (#735)

Fix

  1. fix logger middleware problems (#715)
  2. fix ppo parallel bug (#709)
  3. fix typo in optimizer_helper.py (#726)
  4. fix mlp dropout if condition bug
  5. fix drex collecting data unittest bugs

Style

  1. polish env manager/wrapper comments and API doc (#742)
  2. polish model comments and API doc (#722) (#729) (#734) (#736) (#741)
  3. polish policy comments and API doc (#732)
  4. polish rl_utils comments and API doc (#724)
  5. polish torch_utils comments and API doc (#738)
  6. update README.md and Colab demo (#733)
  7. update metaworld docker image

News

  1. NeurIPS 2023 Spotlight: LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
  2. OpenDILab + Hugging Face DRL Model Zoo link

Full Changelog: v0.4.9...v0.5.0

Contributors: @PaParaZz1 @zjowowen @AltmanD @puyuan1996 @kxzxvbk @Super1ce @nighood @Cloud-Pku @zhangpaipai @ruoyuGao @eltociear

v0.4.9

23 Aug 09:49
Compare
Choose a tag to compare

API Change

  1. refactor the implementation of Decision Transformer, DI-engine supports both discrete and continuous DT outputs with the multi-modal observation now (example: ding/example/dt.py)
  2. Update the multi-GPU Distributed Data Parallel (DDP) example (link)
  3. Change the return value of InteractionSerialEvaluator, simplifying redundant results

Env

  1. add cliffwalking env (#677)
  2. add lunarlander ppo config and example

Algorithm

  1. add BCQ offline RL algorithm (#640)
  2. add Dreamerv3 model-based RL algorithm (#652)
  3. add tensor stream merge network tools (#673)
  4. add scatter connection model (#680)
  5. refactor Decision Transformer in new pipeline and support img input and discrete output (#693)
  6. add three variants of Bilinear classes and a FiLM class (#703)

Enhancement

  1. polish offpolicy RL multi-gpu DDP training (#679)
  2. add middleware for Ape-X distributed pipeline (#696)
  3. add example for evaluating trained DQN (#706)

Fix

  1. fix to_ndarray fails to assign dtype for scalars (#708)
  2. fix evaluator return episode_info compatibility bug
  3. fix cql example entry wrong config bug
  4. fix enable_save_figure env interface
  5. fix redundant env info bug in evaluator
  6. fix to_item unittest bug

Style

  1. polish and simplify requirements (#672)
  2. add Hugging Face Model Zoo badge (#674)
  3. add openxlab Model Zoo badge (#675)
  4. fix py37 macos ci bug and update default pytorch from 1.7.1 to 1.12.1 (#678)
  5. fix mujoco-py compatibility issue for cython<3 (#711)
  6. fix type spell error (#704)
  7. fix pypi release actions ubuntu 18.04 bug
  8. update contact information (e.g. wechat)
  9. polish algorithm doc tables

New Repo

  1. DOS: [CVPR 2023] ReasonNet: End-to-End Driving with Temporal and Global Reasoning

Full Changelog: v0.4.8...v0.4.9

Contributors: @PaParaZz1 @zjowowen @zhangpaipai @AltmanD @puyuan1996 @Cloud-Pku @Super1ce @kxzxvbk @jayyoung0802 @Mossforest @lxl2gf @Privilger

v0.4.8

25 May 05:27
Compare
Choose a tag to compare

API Change

  1. stop value is not the necessary field in config, defaults to math.inf, users can indicate max_env_step or max_train_iter in training entry to run the program with a fixed termination condition.

Env

  1. fix gym hybrid reward dtype bug (#664)
  2. fix atari env id noframeskip bug (#655)
  3. fix typo in gym any_trading env (#654)
  4. update td3bc d4rl config (#659)
  5. polish bipedalwalker config

Algorithm

  1. add EDAC offline RL algorithm (#639)
  2. add LN and GN norm_type support in ResBlock (#660)
  3. add normal value norm baseline for PPOF (#658)
  4. polish last layer init/norm in MLP (#650)
  5. polish TD3 monitor variable

Enhancement

  1. add MAPPO/MASAC task example (#661)
  2. add PPO example for complex env observation (#644)
  3. add barrier middleware (#570)

Fix

  1. fix abnormal collector log and add record_random_collect option (#662)
  2. fix to_item compatibility bug (#646)
  3. fix trainer dtype transform compatibility bug
  4. fix pettingzoo 1.23.0 compatibility bug
  5. fix ensemble head unittest bug

Style

  1. fix incompatible gym version bug in Dockerfile.env (#653)
  2. add more algorithm docs

New Repo

  1. LightZero: A lightweight and efficient MCTS/AlphaZero/MuZero algorithm toolkit.

Full Changelog: v0.4.6...v0.4.7

Contributors: @PaParaZz1 @zjowowen @puyuan1996 @SolenoidWGT @Super1ce @karroyan @zhangpaipai @eltociear

v0.4.7

11 Apr 16:55
Compare
Choose a tag to compare

API Change

  1. remove the requirements of sub fields (learn/collect/eval) in the policy config (users can define their own config formats)
  2. use wandb as the default logger in task pipeline
  3. remove value_network config field and implementations in SAC and related algorithms

Env

  1. add dmc2gym env support and baseline (#451)
  2. update pettingzoo to the latest version (#597)
  3. polish icm/rnd+onppo config bugs and add app_door_to_key env (#564)
  4. add lunarlander continuous TD3/SAC config
  5. polish lunarlander discrete C51 config

Algorithm

  1. add Procedure Cloning (PC) imitation learning algorithm (#514)
  2. add Munchausen Reinforcement Learning (MDQN) algorithm (#590)
  3. add reward/value norm methods: popart & value rescale & symlog (#605)
  4. polish reward model config and training pipeline (#624)
  5. add PPOF reward space demo support (#608)
  6. add PPOF Atari demo support (#589)
  7. polish dqn default config and env examples (#611)
  8. polish comment and clean code about SAC

Enhancement

  1. add language model (e.g. GPT) training utils (#625)
  2. remove policy cfg sub fields requirements (#620)
  3. add full wandb support (#579)

Fix

  1. fix confusing shallow copy operation about next_obs (#641)
  2. fix unsqueeze action_args in PDQN when shape is 1 (#599)
  3. fix evaluator return_info tensor type bug (#592)
  4. fix deque buffer wrapper PER bug (#586)
  5. fix reward model save method compatibility bug
  6. fix logger assertion and unittest bug
  7. fix bfs test py3.9 compatibility bug
  8. fix zergling collector unittest bug

Style

  1. add DI-engine torch-rpc p2p communication docker (#628)
  2. add D4RL docker (#591)
  3. correct typo in task (#617)
  4. correct typo in time_helper (#602)
  5. polish readme and add treetensor example
  6. update contributing doc

New Plan

  • Call for contributors about DI-engine (#621)

Full Changelog: v0.4.6...v0.4.7

Contributors: @PaParaZz1 @karroyan @zjowowen @ruoyuGao @kxzxvbk @nighood @song2181 @SolenoidWGT @PSHarold @jimmydengpeng @eltociear

v0.4.6

18 Feb 13:49
Compare
Choose a tag to compare

API Change

  1. middleware: CkptSaver(cfg, policy, train_freq=100) -> CkptSaver(policy, cfg.exp_name, train_freq=100)

Env

  1. add metadrive env and related ppo config (#574)
  2. add acrobot env and related dqn config (#577)
  3. add carracing in box2d (#575)
  4. add new gym hybrid viz (#563)
  5. update cartpole IL config (#578

Algorithm

  1. add BDQ algorithm (#558)
  2. add procedure cloning model (#573)

Enhancement

  1. add simplified PPOF (PPO × Family) interface (#567) (#568) (#581) (#582)

Fix

  1. fix to_device and prev_state bug when using ttorch (#571)
  2. fix py38 and numpy unittest bugs (#565)
  3. fix typo in contrastive_loss.py (#572)
  4. fix dizoo envs pkg installation bugs
  5. fix multi_trainer middleware unittest bug

Style

  1. add evogym docker (#580)
  2. fix metaworld docker bug
  3. fix setuptools high version incompatibility bug
  4. extend treetensor lowest version

New Paper

  1. GoBigger: [ICLR 2023] A Scalable Platform for Cooperative-Competitive Multi-Agent Interactive Simulation

Contributors: @PaParaZz1 @puyuan1996 @timothijoe @Cloud-Pku @ruoyuGao @Super1ce @karroyan @kxzxvbk @eltociear

v0.4.5

13 Dec 17:40
Compare
Choose a tag to compare

API Change

  1. Move default examples about adding new env from extending BaseEnv to utilize DingEnvWrapper
  2. rename final_eval_reward to eval_episode_return in all related codes (including envs and evaluators)

Env

  1. add beergame supply chain optimization env (#512)
  2. add env gym_pybullet_drones (#526)
  3. rename eval reward to episode return (#536)

Algorithm

  1. add policy gradient algo implementation (#544)
  2. add MADDPG algo implementation (#550)
  3. add IMPALA continuous algo implementation (#551)
  4. add MADQN algo implementation (#540)

Enhancement

  1. add new task IMPALA-type distributed training scheme (#321)
  2. add load and save method for replaybuffer (#542)
  3. add more DingEnvWrapper example (#525)
  4. add evaluator more info viz support (#538)
  5. add trackback log for subprocess env manager (#534)

Fix

  1. fix halfcheetah td3 config file (#537
  2. fix mujoco action_clip args compatibility bug (#535)
  3. fix atari a2c config entry bug
  4. fix drex unittest compatibility bug

Style

  1. add Roadmap issue of DI-engine (#548)
  2. update related project link and new env doc

New Project

  1. PPOxFamily: PPO x Family DRL Tutorial Course
  2. ACE: [AAAI 2023] Official PyTorch implementation of paper "ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency".

Contributors: @PaParaZz1 @sailxjx @zjowowen @hiha3456 @Weiyuhong-1998 @kxzxvbk @song2181 @zerlinwang

v0.4.4

31 Oct 08:52
Compare
Choose a tag to compare

API Change

  1. context in new task pipeline is implemented by dataclass now, rather than dict
  2. recommend visulization is wandb now, rather than tensorboard

Env

  1. add modified gym-hybrid including moving, sliding and hardmove (#505) (#519)
  2. add evogym support (#495) (#527)
  3. add save_replay_gif option (#506)
  4. adapt minigrid_env and related config to latest MiniGrid v2.0.0 (#500)

Algorithm

  1. add pcgrad optimizer (#489)
  2. add some features in MLP and ResBlock (#511)
  3. delete mcts related modules (#518) (we will release a MCTS repo in future)

Enhancement

  1. add wandb middleware and demo (#488) (#523) (#528)
  2. add new properties in Context (#499)
  3. add single env policy wrapper for policy deployment (demo)
  4. add custom model demo and doc (文档)

Fix

  1. fix build logger args and unittests (#522)
  2. fix total_loss calculation in PDQN (#504)
  3. fix save gif function bug
  4. fix level sample unittest bug

Style

  1. update contact email address (#503)
  2. polish env log and resblock name
  3. add details button in readme

New Repo

  • DI-1024: Deep Reinforcement Learning + 1024 Game

Contributors: @PaParaZz1 @puyuan1996 @karroyan @hiha3456 @davide97l @Weiyuhong-1998 @zjowowen @norman26625