-
Notifications
You must be signed in to change notification settings - Fork 230
Open
Labels
questionFurther information is requestedFurther information is requested
Description
❓ Question
I tried to run PPO and Maskable PPO on my custom environment with the same configuration, but I found that Maskable PPO(~5fps) is mush slower than PPO(~140fps).
Here's my configurations:
- environment setup
env.action_space = MultiBinary(339)
- reproduction code
config = { "env_name": "my_env_name", "n_envs": 16, "policy_type": "MlpPolicy", "total_timesteps": 100000, } # DummyVecEnv vec_env = make_vec_env(config["env_name"], n_envs=config["n_envs"]) model = MaskablePPO("MlpPolicy", vec_env, n_steps=128, verbose=1") # model = PPO("MlpPolicy", vec_env, n_steps=128, verbose=1") model.learn( total_timesteps=config["total_timesteps"], callback=WandbCallback( gradient_save_freq=100, model_save_path=f"models/{experiment_name}", verbose=2, ), progress_bar=True, )
I also tried to profile my code with py-spy, and I found that MaskablePPO spent many extra time in these lines
while PPO spends much less time in train and most of its time in collect_rollouts just as expected.
I wonder if this extreme decline in training efficiency is a normal situation because of the large action_space or if there are other bugs in the implementation.
Checklist
- I have checked that there is no similar issue in the repo
- I have read the documentation
- If code there is, it is minimal and working
- If code there is, it is formatted using the markdown code blocks for both code and stack traces.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested
