Skip to content

Commit cce61a5

Browse files
committed
update README.md
1 parent 9e1f95e commit cce61a5

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ Safe exploration is a challenging and important problem in model-free reinforcem
171171
## [Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief](./PMDB)
172172

173173
Code associdated to: [Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief](https://nips.cc/Conferences/2022/Schedule?showEvent=54842) accepted
174-
at **NeurIPS22** conference..
174+
at **NeurIPS22** conference.
175175

176176
#### Abstract
177177
Model-based offline reinforcement learning (RL) aims to find highly rewarding policy, by leveraging a previously
@@ -183,7 +183,7 @@ through reward penalty may incur unexpected tradeoff between model utilization a
183183
instead maintain a belief distribution over dynamics, and evaluate/optimize policy through biased sampling from the
184184
belief. The sampling procedure, biased towards pessimism, is derived based on an alternating Markov game formulation
185185
of offline RL. We formally show that the biased sampling naturally induces an updated dynamics belief with
186-
policy-dependent reweighting factor, termed \emph{Pessimism-Modulated Dynamics Belief}. To improve policy, we devise an
186+
policy-dependent reweighting factor, termed *Pessimism-Modulated Dynamics Belief*. To improve policy, we devise an
187187
iterative regularized policy optimization algorithm for the game, with guarantee of monotonous improvement under certain
188188
condition. To make practical, we further devise an offline RL algorithm to approximately find the solution. Empirical
189189
results show that the proposed approach achieves state-of-the-art performance on a wide range of benchmark tasks.

0 commit comments

Comments
 (0)