Skip to content

[Question] Significant Performance Disparity Between Maskable PPO and PPO #283

@gemelom

Description

@gemelom

❓ Question

I tried to run PPO and Maskable PPO on my custom environment with the same configuration, but I found that Maskable PPO(~5fps) is mush slower than PPO(~140fps).

Here's my configurations:

  • environment setup
    env.action_space = MultiBinary(339)
  • reproduction code
    config = {
        "env_name": "my_env_name",
        "n_envs": 16,
        "policy_type": "MlpPolicy",
        "total_timesteps": 100000,
    }
    
    # DummyVecEnv
    vec_env = make_vec_env(config["env_name"], n_envs=config["n_envs"])
    
    model = MaskablePPO("MlpPolicy", vec_env, n_steps=128, verbose=1")
    # model = PPO("MlpPolicy", vec_env, n_steps=128, verbose=1")
    
    model.learn(
        total_timesteps=config["total_timesteps"],
        callback=WandbCallback(
            gradient_save_freq=100,
            model_save_path=f"models/{experiment_name}",
            verbose=2,
        ),
        progress_bar=True,
    )

I also tried to profile my code with py-spy, and I found that MaskablePPO spent many extra time in these lines

Image

while PPO spends much less time in train and most of its time in collect_rollouts just as expected.

I wonder if this extreme decline in training efficiency is a normal situation because of the large action_space or if there are other bugs in the implementation.

Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions