[Question] Significant Performance Disparity Between Maskable PPO and PPO

### ❓ Question

I tried to run PPO and Maskable PPO on my custom environment with the same configuration, but I found that Maskable PPO(~5fps) is mush slower than PPO(~140fps).

Here's my configurations:
- environment setup
  ```python
  env.action_space = MultiBinary(339)
  ```
- reproduction code
  ```python
  config = {
      "env_name": "my_env_name",
      "n_envs": 16,
      "policy_type": "MlpPolicy",
      "total_timesteps": 100000,
  }

  # DummyVecEnv
  vec_env = make_vec_env(config["env_name"], n_envs=config["n_envs"])

  model = MaskablePPO("MlpPolicy", vec_env, n_steps=128, verbose=1")
  # model = PPO("MlpPolicy", vec_env, n_steps=128, verbose=1")

  model.learn(
      total_timesteps=config["total_timesteps"],
      callback=WandbCallback(
          gradient_save_freq=100,
          model_save_path=f"models/{experiment_name}",
          verbose=2,
      ),
      progress_bar=True,
  )
  ```

I also tried to profile my code with [py-spy](https://github.com/benfred/py-spy), and I found that MaskablePPO spent many extra time in [these lines](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/common/maskable/policies.py#L347-L349)

![Image](https://github.com/user-attachments/assets/9ef63050-47ed-4255-8bbf-948e8add875e)

while PPO spends much less time in `train` and most of its time in `collect_rollouts` just as expected.

I wonder if this extreme decline in training efficiency is a normal situation because of the large action_space or if there are other bugs in the implementation.

### Checklist

- [x] I have checked that there is no similar [issue](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues) in the repo
- [x] I have read the [documentation](https://sb3-contrib.readthedocs.io/en/master/)
- [x] If code there is, it is [minimal and working](https://github.com/DLR-RM/stable-baselines3/issues/982#issuecomment-1197044014)
- [x] If code there is, it is formatted using the [markdown code blocks](https://help.github.com/en/articles/creating-and-highlighting-code-blocks) for both code and stack traces.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Significant Performance Disparity Between Maskable PPO and PPO #283

❓ Question

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Significant Performance Disparity Between Maskable PPO and PPO #283

Description

❓ Question

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions