Skip to content

Commit df02585

Browse files
sailxjxHansBugpuyuan1996LuciusMoszjowowen
authored
polish(xjx): 1.0 (#143)
* polish(xjx): change doc structure, add intro * Replace head image * polish(xjx): system, basic, middleware (#117) * Add middleware * Add system desigin * Add middleware spec * Change search background * Add quick start * Use pytorch theme * Adjusting grammar and errors * New logo * doc(hansbug): add guides for unittest, visualization and code style. (#127) * doc(hansbug): add 3 new pages * doc(hansbug): add code style page * dev(hansbug): add plantuml's documentation * dev(hansbug): add note * dev(hansbug): align the image to center * dev(hansbug): add graphviz's documentation * dev(hansbug): add documentation for draw.io * dev(hansbug): fix problem on draw.io * dev(hansbug): add introduction for snakeviz * dev(hansbug): add introduction for snakeviz * dev(hansbug): add former parts of unittest * fix(hansbug): do some fix * dev(hansbug): add writing guide for unittest * fix(hansbug): fix bug of * dev(hansbug): add running guide for unittest * fix(hansbug): fix the last code block * dev(hansbug): append features to visualization * dev(hansbug): add code style guide * dev(hansbug): move the docs to new path * fix(hansbug): fix the problems in chinese pages * fix(hansbug): use english tutorials * feature(pu): add config_spec_zh, basic_rl_zh, exploration_rl_zh, imitation_learning_zh (#126) * feature(pu): add config_spec_zh * feature(pu): add config_spec_zh, basic_rl_zh, exploration_rl_zh, imitation_learning_zh * polish(pu):polish index * polish(pu): polish style * polish(zlx): 24-cooperation (#122) * polish(zlx): Init 24-cooperation(git + issue/pr) * polish(zlx): Add git_guide and issue_pr * fix(zlx): fix comments by xjx * feature(zlx): Add en version of 24-cooperation * polish(zlx): fix comments by xjx * polish(zjow): polish and revise quickstart and installation. (#121) * Polish Quickstart. * Minor change. * Minor change. * polish(zlx): 13. envs (#118) * polish(zlx): Move env to new place. Polish index and images * polish(zlx): Modify image scale * polish(zlx): Add space in zh version * polish(zlx): Move env to new place. Polish index and images * polish(zlx): Modify image scale * fixbug(zlx): en index indent * polish(zlx): polish format via make live * polish(zlx): fix comments by xjx Co-authored-by: zhaoliangxuan <[email protected]> * doc(nyz): add distributed rl overview (#133) * doc(nyz): add distributed rl overview * polish(nyz): polish footnote and note * doc(davide): transfer 12 policies (#120) * filled index en and ch * Update index.rst * added dqn_zh in index * doc(zms): 11_dizoo: add zh + en version of index (#130) * 1st zh doc * change * change links * add note * draft version of en dizoo * change a bit * final version * Update index.rst * polish(nyz): add missing images and polish doc * doc(lxl): 02_algo: add offline rl zh (#125) * polish(lxl): fix gramma and typo * resolve conflicts when changing branchs * doc(lxl): add 02_algo/offline_rl_zh draft * add offline rl doc * polish offline rl doc * polish offline rl * polish offline rl: reformat reference * polish offline rl: fix typo * doc(jrn): add 02_algo model_based_rl_zh (#128) * doc(jrn): add 02_algo mbrl * doc(jrn): add 02_algo mbrl * doc(jrn): modify 02_algo mbrl * doc(jrn): modify 02_algo mbrl * doc(jrn): polish 02_algo mbrl zh * modify(jrn): polish source/02_algo/model_based_rl_zh.rst * polish model_based_rl_zh.rst again * polish model_based_rl_zh.rst again * doc(jrn): add 02_algo mbrl * doc(jrn): add 02_algo mbrl * doc(jrn): modify 02_algo mbrl * doc(jrn): modify 02_algo mbrl * doc(jrn): polish 02_algo mbrl zh * modify(jrn): polish source/02_algo/model_based_rl_zh.rst * polish model_based_rl_zh.rst again * polish(zlx): 24-cooperation (#122) * polish(zlx): Init 24-cooperation(git + issue/pr) * polish(zlx): Add git_guide and issue_pr * fix(zlx): fix comments by xjx * feature(zlx): Add en version of 24-cooperation * polish(zlx): fix comments by xjx * polish model_based_rl_zh.rst again * polish(zjow): polish and revise quickstart and installation. (#121) * Polish Quickstart. * Minor change. * Minor change. * polish(nyz): add offline rl and gtrxl images * doc(pu): add exploration overview and footnote for exploration_rl_zh (#134) * feature(pu): add config_spec_zh * feature(pu): add config_spec_zh, basic_rl_zh, exploration_rl_zh, imitation_learning_zh * polish(pu):polish index * polish(pu): polish style * polish(pu): polish style * polish(pu): add exploration overview and footnote * fix(pu): fix wrongly changed file * polish(pu): add information-theory-based exloration part * trasnlate(zlx): Integrate past translation prs (#135) * translate(nyp): diayn zh * doc(py): ngu zh * translate(gh): smac zh * translate(gh): icm zh * translate(cy): cartpole & gym-hybrid en * translate(xzy): minigrd & pendulum en * translate(zyc&yf): bipedalwalker & lunarlander en * translate(hs): mujoco & procgen en * translate(hs): r2d3 en * polish(zlx): remove .. _ in 13_envs * polish(zlx): polish format * doc(wyh): algo02 MARL docs (#129) * doc(wyh):marl * doc(wyh):marl polish * translation(lxl): add offline_rl_en & polish offline_rl_zh (#136) * fix(nyz): fix offline rl author typo * polish(zlx): polish mujoco & r2d3 by hs, which are ignored before (#138) * doc(zms): add comments to "framework/middleware" (#137) * add the refs to comments of "framework/middleware" * change maxdepth of framework/index.rst from 4 to 2 * polish(lxl): polish offline_rl_zh, fix typo and gramma (#139) * add offline_rl_en * reorganize the description of Future & Outlooks * polish * doc(nyp): add best practice zh for our doc 1.0 (#132) * doc(wzl): add pettingzoo.zh doc (#124) * add pettingzoo_zh.rst * update pettingzoo_zh.rst * fix(hs):fix install atari_env error (#116) * fix install atari_env error * Update atari.rst * Update atari_zh.rst * add best practice zh for doc 1.0 * add rnn translation * finish rnn; fix some translations (wrappers) * modify regarding the comments * change wrt comment * modify unroll_len/sequence_len key * fix multi-discrete action space Co-authored-by: zerlinwang <[email protected]> Co-authored-by: norman <[email protected]> Co-authored-by: nieyunpeng <[email protected]> * Cleanup old resources * Space * Fix offline rl Co-authored-by: Hankson Bradley <[email protected]> Co-authored-by: 蒲源 <[email protected]> Co-authored-by: LuciusMos <[email protected]> Co-authored-by: zjowowen <[email protected]> Co-authored-by: zhaoliangxuan <[email protected]> Co-authored-by: Swain <[email protected]> Co-authored-by: Davide Liu <[email protected]> Co-authored-by: zms <[email protected]> Co-authored-by: lixl-st <[email protected]> Co-authored-by: Jia Ruonan <[email protected]> Co-authored-by: Weiyuhong-1998 <[email protected]> Co-authored-by: Will-Nie <[email protected]> Co-authored-by: zerlinwang <[email protected]> Co-authored-by: norman <[email protected]> Co-authored-by: nieyunpeng <[email protected]>
1 parent 6a7becd commit df02585

File tree

775 files changed

+9970
-13178
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

775 files changed

+9970
-13178
lines changed

.gitignore

+5-4
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1-
*.eps
2-
*.jpg
3-
*.svg
1+
*.puml.eps
2+
*.puml.jpg
3+
*.puml.svg
44
.DS_Store
55
build/
66
source/_build
77
_build/
88
.vscode/
99
venv/
10-
.idea/
10+
.idea/
11+
src/pytorch-sphinx-theme/

requirements.txt

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
Pillow==8.2.0
22
sphinx>=2.2.1,<=4.2
3-
sphinx_rtd_theme~=0.4.3
3+
sphinx_rtd_theme
44
enum_tools
55
sphinx-toolbox
66
plantumlcli>=0.0.2
77
sphinx-autobuild
88
git+http://github.com/opendilab/DI-engine@main
9+
-e git+https://github.com/opendilab/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme

source/00_intro/index.rst

+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
Introduction
2+
===============================
3+
4+
What is DI-engine?
5+
-------------------------------
6+
7+
DI-engine is a decision intelligence platform built by a group of enthusiastic researchers and engineers, \
8+
that will provide you with the most professional and convenient assistance for your reinforcement learning algorithm research \
9+
and development work, mainly including:
10+
11+
1. Comprehensive algorithm support, such as DQN, PPO, SAC, and many related algorithms for research subfields - \
12+
QMIX for multi-intelligent reinforcement learning, GAIL for inverse reinforcement learning, RND for exploration problems, etc.
13+
14+
2. User-friendly interface, we abstract most common objects in reinforcement learning tasks, such as environments, policies, \
15+
and encapsulate complex reinforcement learning processes into middleware, allowing you to build your own learning process as you wish.
16+
17+
3. Flexible scalability, using the integrated messaging components and event programming interfaces within the framework, \
18+
you can flexibly scale your basic research work to industrial-grade large-scale training clusters, \
19+
such as StarCraft Intelligence `DI-star <https://github.com/opendilab/DI-star>`_.
20+
21+
.. image::
22+
../images/system_layer.png
23+
24+
Key Concepts
25+
-------------------------------
26+
27+
If you are not familiar with reinforcement learning, you can go to our `reinforcement learning tutorial <../10_concepts/index_zh.html>`_ \
28+
for a glimpse into the wonderful world of reinforcement learning.
29+
30+
If you have already been exposed to reinforcement learning, you will already be familiar with the basic interaction objects of reinforcement learning: \
31+
**environments** and **agents (or the policies that make them up)**.
32+
33+
Instead of creating more concepts, the DI-engine abstracts the complex interaction logic between the two into declarative middleware, \
34+
such as **collect**, **train**, **evaluate**, and **save_ckpt**. You can adapt each part of the process in the most natural way.
35+
36+
Using the DI-engine will be very easy, in the `quickstart <... /01_quickstart/index_zh.html>`_, \
37+
we will show you how to quickly build a classic reinforcement learning process using DI-engine with a simple example.

source/00_intro/index_zh.rst

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
DI-engine 简介
2+
===============================
3+
4+
了解 DI-engine
5+
-------------------------------
6+
7+
DI-engine 是由一群充满活力的研究员和工程师打造的决策智能平台,它将为您的强化学习算法研究和开发工作提供最专业最便捷的帮助,主要包括:
8+
9+
1. 完整的算法支持,例如 DQN,PPO,SAC 以及许多研究子领域的相关算法——多智能体强化学习中的 QMIX,逆强化学习中的 GAIL,探索问题中的 RND 等等。
10+
11+
2. 友好的用户接口,我们抽象了强化学习任务中的大部分常见对象,例如环境,策略,并将复杂的强化学习流程封装成丰富的中间件,让您随心所欲的构建自己的学习流程。
12+
13+
3. 弹性的拓展能力,利用框架内集成的消息组件和事件编程接口,您可以灵活的将基础研究工作拓展到工业级大规模训练集群中,例如星际争霸智能体 `DI-star <https://github.com/opendilab/DI-star>`_。
14+
15+
.. image::
16+
../images/system_layer.png
17+
18+
核心概念
19+
-------------------------------
20+
21+
假如您尚未了解强化学习,可以转至我们的 `强化学习教程 <../10_concepts/index_zh.html>`_ 一窥强化学习的奇妙世界。
22+
23+
假如您已经接触过强化学习,想必已经非常了解强化学习的基本交互对象: **环境** 和 **智能体(或者构成智能体的策略)**。
24+
25+
DI-engine 没有创造更多的概念,而是将这两者之间复杂的交互逻辑抽象成了声明式的中间件,例如 **采集数据(collect)**,**训练模型(train)**,**评估模型(evaluate)**,**保存模型(save_ckpt)**,
26+
您可以依据最自然的方式调整流程中的各个环节。
27+
28+
使用 DI-engine 将会非常简单,在 `快速开始 <../01_quickstart/index_zh.html>`_ 部分,我们将通过一个简单的例子向您介绍,如何使用 DI-engine 快速搭建一个经典的强化学习流程。
+109
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
First Reinforcement Learning Program
2+
======================================
3+
4+
.. toctree::
5+
:maxdepth: 2
6+
7+
CartPole is the ideal learning environment for an introduction to reinforcement learning, \
8+
and using the DQN algorithm allows CartPole to converge (maintain equilibrium) in a very short time. \
9+
We will introduce the use of DI-engine based on CartPole + DQN.
10+
11+
.. image::
12+
images/cartpole_cmp.gif
13+
:width: 1000
14+
:align: center
15+
16+
Using the Configuration File
17+
------------------------------
18+
19+
The DI-engine uses a global configuration file to control all variables of the environment and strategy, \
20+
each of which has a corresponding default configuration that can be found in \
21+
`cartpole_dqn_config <https://github.com/opendilab/DI-engine/blob/main/dizoo/classic_control/cartpole/config/cartpole_dqn_config.py>`_, \
22+
in the tutorial we use the default configuration directly:
23+
24+
.. code-block:: python
25+
26+
from dizoo.classic_control.cartpole.config.cartpole_dqn_config import main_config, create_config
27+
from ding.config import compile_config
28+
29+
cfg = compile_config(main_config, create_cfg=create_config, auto=True)
30+
31+
Initialize the Environments
32+
------------------------------
33+
34+
In reinforcement learning, there may be a difference in the strategy for collecting environment data \
35+
between the training process and the evaluation process, for example, the training process tends to train \
36+
one epoch for n steps of collection, while the evaluation process requires completing the whole game to get a score. \
37+
We recommend that the collection and evaluation environments be initialized separately as follows.
38+
39+
.. code-block:: python
40+
41+
from ding.envs import DingEnvWrapper, BaseEnvManagerV2
42+
43+
collector_env = BaseEnvManagerV2(
44+
env_fn=[lambda: DingEnvWrapper(gym.make("CartPole-v0")) for _ in range(cfg.env.collector_env_num)],
45+
cfg=cfg.env.manager
46+
)
47+
evaluator_env = BaseEnvManagerV2(
48+
env_fn=[lambda: DingEnvWrapper(gym.make("CartPole-v0")) for _ in range(cfg.env.evaluator_env_num)],
49+
cfg=cfg.env.manager
50+
)
51+
52+
.. note::
53+
54+
DingEnvWrapper is a unified wrapper of DI-engine for different environment libraries. \
55+
BaseEnvManagerV2 is a unified external interface for managing multiple environments. \
56+
so you can use BaseEnvManagerV2 to collect multiple environments in parallel.
57+
58+
Select Policy
59+
--------------
60+
61+
DI-engine covers most of the reinforcement learning policies, using them only requires selecting the right policy and model.
62+
Since DQN is off-policy, we also need to instantiate a buffer module.
63+
64+
.. code-block:: python
65+
66+
from ding.model import DQN
67+
from ding.policy import DQNPolicy
68+
from ding.data import DequeBuffer
69+
70+
model = DQN(**cfg.policy.model)
71+
buffer_ = DequeBuffer(size=cfg.policy.other.replay_buffer.replay_buffer_size)
72+
policy = DQNPolicy(cfg.policy, model=model)
73+
74+
Build the Pipeline
75+
---------------------
76+
77+
With the various middleware provided by DI-engine, we can easily build the entire pipeline:
78+
79+
.. code-block:: python
80+
81+
from ding.framework import task
82+
from ding.framework.context import OnlineRLContext
83+
from ding.framework.middleware import OffPolicyLearner, StepCollector, interaction_evaluator, data_pusher, eps_greedy_handler, CkptSaver
84+
85+
with task.start(async_mode=False, ctx=OnlineRLContext()):
86+
# Evaluating, we place it on the first place to get the score of the random model as a benchmark value
87+
task.use(interaction_evaluator(cfg, policy.eval_mode, evaluator_env))
88+
task.use(eps_greedy_handler(cfg)) # Decay probability of explore-exploit
89+
task.use(StepCollector(cfg, policy.collect_mode, collector_env)) # Collect environmental data
90+
task.use(data_pusher(cfg, buffer_)) # Push data to buffer
91+
task.use(OffPolicyLearner(cfg, policy.learn_mode, buffer_)) # Train the model
92+
task.use(CkptSaver(cfg, policy, train_freq=100)) # Save the model
93+
# In the evaluation process, if the model is found to have exceeded the convergence value, it will end early here
94+
task.run()
95+
96+
Run the Code
97+
--------------
98+
99+
The full example can be found in `DQN example <https://github.com/opendilab/DI-engine/blob/main/ding/example/dqn.py>`_ and can be run via ``python dqn.py``.
100+
101+
.. image::
102+
images/train_dqn.gif
103+
:width: 1000
104+
:align: center
105+
106+
Now you have completed your first reinforcement learning task with DI-engine, you can try out more algorithms \
107+
in the `Examples directory <https://github.com/opendilab/DI-engine/blob/main/ding/example>`_, or continue reading \
108+
the documentation to get a deeper understanding of DI-engine's `Algorithm <../02_algo/index.html>`_, `System Design <../03_system/index.html>`_ \
109+
and `Best Practices <../04_best_practice/index.html>`_.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
第一个强化学习程序
2+
============================
3+
4+
.. toctree::
5+
:maxdepth: 2
6+
7+
CartPole 是强化学习入门的理想学习环境,使用 DQN 算法可以在很短的时间内让 CartPole 收敛(保持平衡)。
8+
我们将基于 CartPole + DQN 介绍一下 DI-engine 的用法。
9+
10+
.. image::
11+
images/cartpole_cmp.gif
12+
:width: 1000
13+
:align: center
14+
15+
使用配置文件
16+
--------------
17+
18+
DI-engine 使用一个全局的配置文件来控制环境和策略的所有变量,每个环境和策略都有对应的默认配置,可以在
19+
`cartpole_dqn_config <https://github.com/opendilab/DI-engine/blob/main/dizoo/classic_control/cartpole/config/cartpole_dqn_config.py>`_
20+
看到完整的配置,在教程里我们直接使用默认配置:
21+
22+
.. code-block:: python
23+
24+
from dizoo.classic_control.cartpole.config.cartpole_dqn_config import main_config, create_config
25+
from ding.config import compile_config
26+
27+
cfg = compile_config(main_config, create_cfg=create_config, auto=True)
28+
29+
初始化采集环境和评估环境
30+
------------------------
31+
32+
在强化学习中,训练阶段和评估阶段采集环境数据的策略可能有区别,例如训练阶段往往是采集 n 个步骤就训练一次,
33+
而评估阶段则需要完成整局游戏才能得到评分。我们推荐将采集和评估环境分开初始化:
34+
35+
.. code-block:: python
36+
37+
from ding.envs import DingEnvWrapper, BaseEnvManagerV2
38+
39+
collector_env = BaseEnvManagerV2(
40+
env_fn=[lambda: DingEnvWrapper(gym.make("CartPole-v0")) for _ in range(cfg.env.collector_env_num)],
41+
cfg=cfg.env.manager
42+
)
43+
evaluator_env = BaseEnvManagerV2(
44+
env_fn=[lambda: DingEnvWrapper(gym.make("CartPole-v0")) for _ in range(cfg.env.evaluator_env_num)],
45+
cfg=cfg.env.manager
46+
)
47+
48+
.. note::
49+
50+
DingEnvWrapper 是 DI-engine 对不同环境库的统一封装。BaseEnvManagerV2 管理多个环境的统一对外接口,
51+
利用 BaseEnvManagerV2 可以同时对多个环境进行并行采集。
52+
53+
选择策略
54+
--------------
55+
56+
DI-engine 覆盖了大部分强化学习策略,使用它们只需要选择正确的策略和模型即可。
57+
由于 DQN 是一个 off-policy 策略,所以我们还需要实例化一个 buffer 模块。
58+
59+
.. code-block:: python
60+
61+
from ding.model import DQN
62+
from ding.policy import DQNPolicy
63+
from ding.data import DequeBuffer
64+
65+
model = DQN(**cfg.policy.model)
66+
buffer_ = DequeBuffer(size=cfg.policy.other.replay_buffer.replay_buffer_size)
67+
policy = DQNPolicy(cfg.policy, model=model)
68+
69+
构建训练管线
70+
--------------
71+
72+
利用 DI-engine 提供的各类中间件,我们可以很容易的构建整个训练管线:
73+
74+
.. code-block:: python
75+
76+
from ding.framework import task
77+
from ding.framework.context import OnlineRLContext
78+
from ding.framework.middleware import OffPolicyLearner, StepCollector, interaction_evaluator, data_pusher, eps_greedy_handler, CkptSaver
79+
80+
with task.start(async_mode=False, ctx=OnlineRLContext()):
81+
task.use(interaction_evaluator(cfg, policy.eval_mode, evaluator_env)) # 评估流程,放在第一个是为了获得随机模型的评分作为基准值
82+
task.use(eps_greedy_handler(cfg)) # 衰减探索-利用的概率
83+
task.use(StepCollector(cfg, policy.collect_mode, collector_env)) # 采集环境数据
84+
task.use(data_pusher(cfg, buffer_)) # 将数据保存到 buffer
85+
task.use(OffPolicyLearner(cfg, policy.learn_mode, buffer_)) # 训练模型
86+
task.use(CkptSaver(cfg, policy, train_freq=100)) # 保存模型
87+
task.run() # 在评估流程中,如果发现模型表现已经超过了收敛值,这里将提前结束
88+
89+
运行代码
90+
--------------
91+
92+
代码完整的示例代码可以在 `DQN example <https://github.com/opendilab/DI-engine/blob/main/ding/example/dqn.py>`_ 中找到,通过 ``python dqn.py`` 即可运行代码
93+
94+
.. image::
95+
images/train_dqn.gif
96+
:width: 1000
97+
:align: center
98+
99+
至此您已经完成了 DI-engine 的第一个强化学习任务,您可以在 `示例目录 <https://github.com/opendilab/DI-engine/blob/main/ding/example>`_ 中尝试更多的算法,
100+
或继续阅读文档来深入了解 DI-engine 的 `算法 <../02_algo/index_zh.html>`_, `系统设计 <../03_system/index_zh.html>`_ 和 `最佳实践 <../04_best_practice/index_zh.html>`_。
2.52 MB
Loading

source/01_quickstart/index.rst

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Quickstart
2+
============================
3+
4+
.. toctree::
5+
:maxdepth: 2
6+
7+
installation
8+
first_rl_program

source/01_quickstart/index_zh.rst

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
快速开始
2+
============================
3+
4+
.. toctree::
5+
:maxdepth: 2
6+
7+
installation_zh
8+
first_rl_program_zh

0 commit comments

Comments
 (0)