File tree Expand file tree Collapse file tree 1 file changed +2
-2
lines changed Expand file tree Collapse file tree 1 file changed +2
-2
lines changed Original file line number Diff line number Diff line change @@ -171,7 +171,7 @@ Safe exploration is a challenging and important problem in model-free reinforcem
171171## [ Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief] ( ./PMDB )
172172
173173Code associdated to: [ Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief] ( https://nips.cc/Conferences/2022/Schedule?showEvent=54842 ) accepted
174- at ** NeurIPS22** conference..
174+ at ** NeurIPS22** conference.
175175
176176#### Abstract
177177Model-based offline reinforcement learning (RL) aims to find highly rewarding policy, by leveraging a previously
@@ -183,7 +183,7 @@ through reward penalty may incur unexpected tradeoff between model utilization a
183183instead maintain a belief distribution over dynamics, and evaluate/optimize policy through biased sampling from the
184184belief. The sampling procedure, biased towards pessimism, is derived based on an alternating Markov game formulation
185185of offline RL. We formally show that the biased sampling naturally induces an updated dynamics belief with
186- policy-dependent reweighting factor, termed \emph{ Pessimism-Modulated Dynamics Belief} . To improve policy, we devise an
186+ policy-dependent reweighting factor, termed * Pessimism-Modulated Dynamics Belief* . To improve policy, we devise an
187187iterative regularized policy optimization algorithm for the game, with guarantee of monotonous improvement under certain
188188condition. To make practical, we further devise an offline RL algorithm to approximately find the solution. Empirical
189189results show that the proposed approach achieves state-of-the-art performance on a wide range of benchmark tasks.
You can’t perform that action at this time.
0 commit comments