You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+6-25Lines changed: 6 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,8 +10,7 @@ Huawei, Noah's Ark Lab.
10
10
-[Bayesian Optimisation with Compositional Optimisers](./CompBO)
11
11
-[AntBO: Antibody Design with Combinatorial Bayesian Optimisation](./AntBO)
12
12
- Reinforcement Learning Research
13
-
-[Sauté RL: Almost Surely Safe RL Using State Augmentation](./SAUTE)
14
-
-[SIMMER - Enhancing Safe Exploration Using Safety State Augmentation](./SIMMER)
13
+
-[Sauté RL and Simmer RL: Safe Reinforcement Learning Using Safety State Augmentation ](./SIMMER)
15
14
-[Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief](./PMDB)
16
15
17
16
Further instructions are provided in the README files associated to each project.
@@ -119,27 +118,11 @@ in vitro experimentation.
119
118
120
119
# Reinforcement Learning Research
121
120
122
-
## [Sauté RL: Almost Surely Safe RL Using State Augmentation](./SAUTE/)
121
+
## [Sauté RL and Simmer RL: Safe Reinforcement Learning Using Safety State Augmentation](./SIMMER)
123
122
124
-
### Sautéing a safe environment
123
+
Codebase associated to: [Sauté RL: Almost Surely Safe RL Using State Augmentation](https://arxiv.org/pdf/2202.06558.pdf) and [Enhancing Safe Exploration Using Safety State Augmentation](https://arxiv.org/pdf/2206.02675.pdf).
125
124
126
-
Safety state augmentation (sautéing) is done in a straightforward manner. Assume a safe environment is defined in
127
-
a class `MySafeEnv`. The sautéed environment is defined using a decorator `saute_env`, which contains all the
128
-
required definitions. Custom and overloaded functions can be defined in the class body.
129
-
130
-
```python
131
-
from envs.common.saute_env import saute_env
132
-
133
-
134
-
@saute_env
135
-
classMySautedEnv(MySafeEnv):
136
-
"""New sauteed class."""
137
-
```
138
-
139
-
Codebase associated to: [Sauté RL: Almost Surely Safe RL Using State Augmentation](https://arxiv.org/pdf/2202.06558.pdf).
140
-
.
141
-
142
-
##### Abstract
125
+
##### Abstract for Sauté RL: Almost Surely Safe RL Using State Augmentation (ICML 2022)
143
126
144
127
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement
145
128
Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability
@@ -151,12 +134,9 @@ approach has a plug-and-play nature, i.e., any RL algorithm can be "sauteed". Ad
151
134
for policy generalization across safety constraints. We finally show that Saute RL algorithms can outperform their
152
135
state-of-the-art counterparts when constraint satisfaction is of high importance.
153
136
154
-
## [SIMMER](./SIMMER)
155
-
156
137
157
-
Codebase associated to: [Enhancing Safe Exploration Using Safety State Augmentation](https://arxiv.org/pdf/2206.02675.pdf).
158
138
159
-
##### Abstract
139
+
##### Abstract for Effects of Safety State Augmentation on Safe Exploration (NeurIPS 2022)
160
140
Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost
161
141
is sparse and unknown, which unavoidably leads to constraint violations -- a phenomenon ideally to be avoided in
162
142
safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is
@@ -168,6 +148,7 @@ Safe exploration is a challenging and important problem in model-free reinforcem
168
148
that simmering a safe algorithm can improve safety during training for both settings. We further show that Simmer can
169
149
stabilize training and improve the performance of safe RL with average constraints.
170
150
151
+
171
152
## [Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief](./PMDB)
172
153
173
154
Code associdated to: [Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief](https://nips.cc/Conferences/2022/Schedule?showEvent=54842) accepted
0 commit comments