23 Dec 08:06

PaParaZz1

f60b377

v0.5.3 Latest

Latest

API Change

Expand the Python version support for DI-engine to Python3.7-Python3.10

Env

add pistonball MARL env and its unittest/example (#833)
update trading env (#831)
update ppo config for better discrete action space performance (#809)
remove unused config fields in MuJoCo PPO

Algorithm

add AWR algorithm (#828)
add encoder in MAVAC (#823)
add HPT model architecture (#841)
fix multiple model wrappers reset bug (#846)
add hybrid action space support to ActionNoiseWrapper (#829)
fix mappo adv compute bug (#812)

Enhancement

add resume_training option to allow the envstep and train_iter resume seamlessly (#835)
polish old/new pipeline DistributedDataParallel (DDP) implementation (#842)
adapt DingEnvWrapper to gymnasium (#817)

Fix

fix priority buffer delete bug (#844)
fix middleware collector env reset bug (#845)
fix many unittest bugs

Style

downgrade pyecharts log level to warning and polish installation doc (#838)
polish necessary requirements
polish api doc details
polish DI-engine citation authors
upgrade CI macos version from 12 to 13

News

CleanS2S: High-quality and streaming Speech-to-Speech interactive agent in a single file.
GenerativeRL: Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective
PRG: Pretrained Reversible Generation as Unsupervised Visual Representation Learning

Full Changelog: v0.5.2...v0.5.3

Contributors: @PaParaZz1 @puyuan1996 @kxzxvbk @YinminZhang @zjowowen @luodi-7 @MarkHolmstrom @TairanMK

Contributors

PaParaZz1, TairanMK, and 6 other contributors

Assets 2

27 Jun 08:56

PaParaZz1

v0.5.2

b4ab08a

v0.5.2

Env

add taxi env (#799) (#807)
add ising model env (#782)
add new Flozen Lake env (#781)
optimize ppo continuous config in MuJoCo (#801)
fix masac smac config multi_agent=True bug (#791)
update/speed up pendulum ppo

Algorithm

fix gtrxl compatibility bug (#796)
fix complex obs demo for ppo pipeline (#786)
add naive PWIL demo
fix marl nstep td compatibility bug

Enhancement

add GPU utils (#788)
add deprecated function decorator (#778)

Style

relax flask requirement (#811)
add new badge (hellogithub) in readme (#805)
update discord link and badge in readme (#795)
fix typo in config.py (#776)
polish rl_utils api docs
add constraint about numpy<2
polish macos platform test version to 12
polish ci python version

News

PsyDI: Towards a Multi-Modal and Interactive Chatbot for Psychological Assessments
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Full Changelog: v0.5.1...v0.5.2

Contributors: @PaParaZz1 @zjowowen @YinminZhang @TuTuHuss @nighood @ruiheng123 @rongkunxue @ooooo-create @eltociear

Contributors

eltociear, PaParaZz1, and 7 other contributors

Assets 2

04 Feb 15:55

PaParaZz1

v0.5.1

fb24992

v0.5.1

Env

add MADDPG pettingzoo example (#774)
polish NGU Atari configs (#767)
fix bug in cliffwalking env (#759)
add PettingZoo replay video demo
change default max retry in env manager from 5 to 1

Algorithm

add QGPO diffusion-model related algorithm (#757)
add HAPPO multi-agent algorithm (#717)
add DreamerV3 + MiniGrid adaption (#725)
fix hppo entropy_weight to avoid nan error in log_prob (#761)
fix structured action bug (#760)
polish Decision Transformer entry (#754)
fix EDAC policy/model bug

Fix

fix env typos
fix pynng requirements bug
fix communication module unittest bug

Style

polish policy API doc (#762) (#764) (#768)
add agent API doc (#758)
polish torch_utils/utils API doc (#745) (#747) (#752) (#755) (#763)

News

Full Changelog: v0.5.0...v0.5.1

Contributors: @PaParaZz1 @zjowowen @nighood @kxzxvbk @puyuan1996 @Cloud-Pku @AltmanD @HarryXuancy

Contributors

AltmanD, PaParaZz1, and 6 other contributors

Assets 2

05 Dec 05:04

PaParaZz1

v0.5.0

4f8f82a

v0.5.0

Env

add tabmwp env (#667)
polish anytrading env issues (#731)

Algorithm

add PromptPG algorithm (#667)
add Plan Diffuser algorithm (#700) (#749)
add new pipeline implementation of IMPALA algorithm (#713)
add dropout layers to DQN-style algorithms (#712)

Enhancement

add new pipeline agent for sac/ddpg/a2c/ppo and Hugging Face support (#637) (#730) (#737)
add more unittest cases for model (#728)
add collector logging in new pipeline (#735)

Fix

fix logger middleware problems (#715)
fix ppo parallel bug (#709)
fix typo in optimizer_helper.py (#726)
fix mlp dropout if condition bug
fix drex collecting data unittest bugs

Style

polish env manager/wrapper comments and API doc (#742)
polish model comments and API doc (#722) (#729) (#734) (#736) (#741)
polish policy comments and API doc (#732)
polish rl_utils comments and API doc (#724)
polish torch_utils comments and API doc (#738)
update README.md and Colab demo (#733)
update metaworld docker image

News

NeurIPS 2023 Spotlight: LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
OpenDILab + Hugging Face DRL Model Zoo link

Full Changelog: v0.4.9...v0.5.0

Contributors: @PaParaZz1 @zjowowen @AltmanD @puyuan1996 @kxzxvbk @Super1ce @nighood @Cloud-Pku @zhangpaipai @ruoyuGao @eltociear

Contributors

eltociear, AltmanD, and 9 other contributors

Assets 2

23 Aug 09:49

PaParaZz1

v0.4.9

3059479

v0.4.9

API Change

refactor the implementation of Decision Transformer, DI-engine supports both discrete and continuous DT outputs with the multi-modal observation now (example: ding/example/dt.py)
Update the multi-GPU Distributed Data Parallel (DDP) example (link)
Change the return value of InteractionSerialEvaluator, simplifying redundant results

Env

add cliffwalking env (#677)
add lunarlander ppo config and example

Algorithm

add BCQ offline RL algorithm (#640)
add Dreamerv3 model-based RL algorithm (#652)
add tensor stream merge network tools (#673)
add scatter connection model (#680)
refactor Decision Transformer in new pipeline and support img input and discrete output (#693)
add three variants of Bilinear classes and a FiLM class (#703)

Enhancement

polish offpolicy RL multi-gpu DDP training (#679)
add middleware for Ape-X distributed pipeline (#696)
add example for evaluating trained DQN (#706)

Fix

fix to_ndarray fails to assign dtype for scalars (#708)
fix evaluator return episode_info compatibility bug
fix cql example entry wrong config bug
fix enable_save_figure env interface
fix redundant env info bug in evaluator
fix to_item unittest bug

Style

polish and simplify requirements (#672)
add Hugging Face Model Zoo badge (#674)
add openxlab Model Zoo badge (#675)
fix py37 macos ci bug and update default pytorch from 1.7.1 to 1.12.1 (#678)
fix mujoco-py compatibility issue for cython<3 (#711)
fix type spell error (#704)
fix pypi release actions ubuntu 18.04 bug
update contact information (e.g. wechat)
polish algorithm doc tables

New Repo

DOS: [CVPR 2023] ReasonNet: End-to-End Driving with Temporal and Global Reasoning

Full Changelog: v0.4.8...v0.4.9

Contributors: @PaParaZz1 @zjowowen @zhangpaipai @AltmanD @puyuan1996 @Cloud-Pku @Super1ce @kxzxvbk @jayyoung0802 @Mossforest @lxl2gf @Privilger

Contributors

Privilger, AltmanD, and 10 other contributors

Assets 2

25 May 05:27

PaParaZz1

v0.4.8

7eb342c

v0.4.8

API Change

stop value is not the necessary field in config, defaults to math.inf, users can indicate max_env_step or max_train_iter in training entry to run the program with a fixed termination condition.

Env

fix gym hybrid reward dtype bug (#664)
fix atari env id noframeskip bug (#655)
fix typo in gym any_trading env (#654)
update td3bc d4rl config (#659)
polish bipedalwalker config

Algorithm

add EDAC offline RL algorithm (#639)
add LN and GN norm_type support in ResBlock (#660)
add normal value norm baseline for PPOF (#658)
polish last layer init/norm in MLP (#650)
polish TD3 monitor variable

Enhancement

add MAPPO/MASAC task example (#661)
add PPO example for complex env observation (#644)
add barrier middleware (#570)

Fix

fix abnormal collector log and add record_random_collect option (#662)
fix to_item compatibility bug (#646)
fix trainer dtype transform compatibility bug
fix pettingzoo 1.23.0 compatibility bug
fix ensemble head unittest bug

Style

fix incompatible gym version bug in Dockerfile.env (#653)
add more algorithm docs

New Repo

LightZero: A lightweight and efficient MCTS/AlphaZero/MuZero algorithm toolkit.

Full Changelog: v0.4.6...v0.4.7

Contributors: @PaParaZz1 @zjowowen @puyuan1996 @SolenoidWGT @Super1ce @karroyan @zhangpaipai @eltociear

Contributors

eltociear, karroyan, and 6 other contributors

Assets 2

11 Apr 16:55

PaParaZz1

v0.4.7

3447f57

v0.4.7

API Change

remove the requirements of sub fields (learn/collect/eval) in the policy config (users can define their own config formats)
use wandb as the default logger in task pipeline
remove value_network config field and implementations in SAC and related algorithms

Env

add dmc2gym env support and baseline (#451)
update pettingzoo to the latest version (#597)
polish icm/rnd+onppo config bugs and add app_door_to_key env (#564)
add lunarlander continuous TD3/SAC config
polish lunarlander discrete C51 config

Algorithm

add Procedure Cloning (PC) imitation learning algorithm (#514)
add Munchausen Reinforcement Learning (MDQN) algorithm (#590)
add reward/value norm methods: popart & value rescale & symlog (#605)
polish reward model config and training pipeline (#624)
add PPOF reward space demo support (#608)
add PPOF Atari demo support (#589)
polish dqn default config and env examples (#611)
polish comment and clean code about SAC

Enhancement

add language model (e.g. GPT) training utils (#625)
remove policy cfg sub fields requirements (#620)
add full wandb support (#579)

Fix

fix confusing shallow copy operation about next_obs (#641)
fix unsqueeze action_args in PDQN when shape is 1 (#599)
fix evaluator return_info tensor type bug (#592)
fix deque buffer wrapper PER bug (#586)
fix reward model save method compatibility bug
fix logger assertion and unittest bug
fix bfs test py3.9 compatibility bug
fix zergling collector unittest bug

Style

add DI-engine torch-rpc p2p communication docker (#628)
add D4RL docker (#591)
correct typo in task (#617)
correct typo in time_helper (#602)
polish readme and add treetensor example
update contributing doc

New Plan

Call for contributors about DI-engine (#621)

Full Changelog: v0.4.6...v0.4.7

Contributors: @PaParaZz1 @karroyan @zjowowen @ruoyuGao @kxzxvbk @nighood @song2181 @SolenoidWGT @PSHarold @jimmydengpeng @eltociear

Contributors

PSHarold, jimmydengpeng, and 9 other contributors

Assets 2

18 Feb 13:49

PaParaZz1

v0.4.6

c11f052

v0.4.6

API Change

middleware: CkptSaver(cfg, policy, train_freq=100) -> CkptSaver(policy, cfg.exp_name, train_freq=100)

Env

add metadrive env and related ppo config (#574)
add acrobot env and related dqn config (#577)
add carracing in box2d (#575)
add new gym hybrid viz (#563)
update cartpole IL config (#578）

Algorithm

add BDQ algorithm (#558)
add procedure cloning model (#573)

Enhancement

add simplified PPOF (PPO × Family) interface (#567) (#568) (#581) (#582)

Fix

fix to_device and prev_state bug when using ttorch (#571)
fix py38 and numpy unittest bugs (#565)
fix typo in contrastive_loss.py (#572)
fix dizoo envs pkg installation bugs
fix multi_trainer middleware unittest bug

Style

add evogym docker (#580)
fix metaworld docker bug
fix setuptools high version incompatibility bug
extend treetensor lowest version

New Paper

GoBigger: [ICLR 2023] A Scalable Platform for Cooperative-Competitive Multi-Agent Interactive Simulation

Contributors: @PaParaZz1 @puyuan1996 @timothijoe @Cloud-Pku @ruoyuGao @Super1ce @karroyan @kxzxvbk @eltociear

Contributors

eltociear, karroyan, and 7 other contributors

Assets 2

13 Dec 17:40

PaParaZz1

v0.4.5

c50379c

v0.4.5

API Change

Move default examples about adding new env from extending BaseEnv to utilize DingEnvWrapper
rename final_eval_reward to eval_episode_return in all related codes (including envs and evaluators)

Env

add beergame supply chain optimization env (#512)
add env gym_pybullet_drones (#526)
rename eval reward to episode return (#536)

Algorithm

add policy gradient algo implementation (#544)
add MADDPG algo implementation (#550)
add IMPALA continuous algo implementation (#551)
add MADQN algo implementation (#540)

Enhancement

add new task IMPALA-type distributed training scheme (#321)
add load and save method for replaybuffer (#542)
add more DingEnvWrapper example (#525)
add evaluator more info viz support (#538)
add trackback log for subprocess env manager (#534)

Fix

fix halfcheetah td3 config file (#537）
fix mujoco action_clip args compatibility bug (#535)
fix atari a2c config entry bug
fix drex unittest compatibility bug

Style

add Roadmap issue of DI-engine (#548)
update related project link and new env doc

New Project

PPOxFamily: PPO x Family DRL Tutorial Course
ACE: [AAAI 2023] Official PyTorch implementation of paper "ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency".

Contributors: @PaParaZz1 @sailxjx @zjowowen @hiha3456 @Weiyuhong-1998 @kxzxvbk @song2181 @zerlinwang

Contributors

sailxjx, PaParaZz1, and 6 other contributors

Assets 2

31 Oct 08:52

PaParaZz1

v0.4.4

b22589c

v0.4.4

API Change

context in new task pipeline is implemented by dataclass now, rather than dict
recommend visulization is wandb now, rather than tensorboard

Env

add modified gym-hybrid including moving, sliding and hardmove (#505) (#519)
add evogym support (#495) (#527)
add save_replay_gif option (#506)
adapt minigrid_env and related config to latest MiniGrid v2.0.0 (#500)

Algorithm

add pcgrad optimizer (#489)
add some features in MLP and ResBlock (#511)
delete mcts related modules (#518) (we will release a MCTS repo in future)

Enhancement

add wandb middleware and demo (#488) (#523) (#528)
add new properties in Context (#499)
add single env policy wrapper for policy deployment (demo)
add custom model demo and doc (文档)

Fix

fix build logger args and unittests (#522)
fix total_loss calculation in PDQN (#504)
fix save gif function bug
fix level sample unittest bug

Style

update contact email address (#503)
polish env log and resblock name
add details button in readme

New Repo

DI-1024: Deep Reinforcement Learning + 1024 Game

Contributors: @PaParaZz1 @puyuan1996 @karroyan @hiha3456 @davide97l @Weiyuhong-1998 @zjowowen @norman26625

Contributors

karroyan, PaParaZz1, and 6 other contributors

Assets 2

Releases: opendilab/DI-engine

v0.5.3

API Change

Env

Algorithm

Enhancement

Fix

Style

News

Contributors

v0.5.2

Env

Algorithm

Enhancement

Style

News

Contributors

v0.5.1

Env

Algorithm

Fix

Style

News

Contributors

v0.5.0

Env

Algorithm

Enhancement

Fix

Style

News

Contributors

v0.4.9

API Change

Env

Algorithm

Enhancement

Fix

Style

New Repo

Contributors

v0.4.8

API Change

Env

Algorithm

Enhancement

Fix

Style

New Repo

Contributors

v0.4.7

API Change

Env

Algorithm

Enhancement

Fix

Style

New Plan

Contributors

v0.4.6

API Change

Env

Algorithm

Enhancement

Fix

Style

New Paper

Contributors

v0.4.5

API Change

Env

Algorithm

Enhancement

Fix

Style

New Project

Contributors

v0.4.4

API Change

Env