Skip to content

[Question] Has Offline RL success on antmaze been reproduced with Minari's antmaze datasets? #296

@perrin-isir

Description

@perrin-isir

Question

I tried to apply Offline RL (in particular IQL) to Minari's antmaze datasets (e.g. medium-play and large-play), but without success so far. IQL succeeds on D4RL antmazes, but with the same hyperparameters (and the same trick that substracts 1 to the reward, see https://github.com/ikostrikov/implicit_q_learning/blob/09d700248117881a75cb21f0adb95c6c8a694cb2/train_offline.py#L68) it fails on Minari's corresponding datasets. Is this a known issue? Has someone been able to solve the Minari antmazes based on the Minari datasets with standard offline RL algorithms?

What I tried so far:

  • Following advice from @younik (see [Proposal] Porting Antmazes to Minari. #152 (comment)), I downgraded mujoco (v3.1.6) but it did not improve the results of offline RL. Results are not zero, so the ant is able to walk towards very simple targets in some cases, but in D4RL the results would typically be much better. I also noticed a potential issue with the dataset size, since among the 1000 episodes, many episodes reach the target after 200/300 steps, and the rest is an accumulation of the reward by staying still, so that makes a total of ~250k useful transitions out of the million in the dataset.
  • I used minari-dataset-generation-scripts (https://github.com/Farama-Foundation/minari-dataset-generation-scripts) to regenerate the medium-play dataset, but with a slight modification: I truncated episodes after each success, to avoid having lots of transitions only accumulating reward in the dataset. But that didn't improve the performance of IQL.

Has anyone been able to perform Offline RL succcessfully on say, Minari's medium and large antmaze datasets?

Note: with the simpler Minari's D4RL/antmaze/umaze-v1 environment, Offline RL is successful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions