Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Does hyperparameter tuning support custom vectorized environments? #439

Closed
5 tasks done
antoinedang opened this issue Mar 15, 2024 · 6 comments
Closed
5 tasks done
Labels
documentation Improvements or additions to documentation question Further information is requested

Comments

@antoinedang
Copy link

❓ Question

Hello,
I have implemented a custom Vectorized Environment using Mujoco (which adheres to stable baseline 3's VecEnv standard), but I haven't found any evidence of RL Zoo 3 supporting (or not supporting) vectorized environments. When I pass my environment name in after registering it with OpenAI gym, RL Zoo 3 always tries to put a VecEnv wrapper on it (such as dummy or subprocenv) and it crashes with an error due to the fact that the interface for a normal Env is not the same as a VecEnv. I am wondering if there is a way (an argument I missed or some source code I could modify) such that I can directly pass in the name of a vectorized environment and RL Zoo 3 will skip the step of wrapping it in a DummyVecEnv and/or SubProcEnv wrapper. I've tried vec_env_wrapper argument in my hyperparameters config, setting env_wrapper to None, and many Google and source code searches but haven't found anything. It doesn't sound like RL Zoo 3 supports this out of the box, but I'm wondering if this is by choice, if I missed a section in the documentation or a past issue already raised, or if I can update the source code so it works for me? (I dont know much about the inner workings of RL Zoo 3, but it seems like an additional argument such as "is_env_vectorized" and an if statement would do the trick).

For context, my hyperparameter config is:

default_hyperparams = dict(
    policy = 'MlpPolicy',
    n_timesteps = 1e7,
    batch_size = 256,
    n_steps = 512,
    gamma = 0.95,
    learning_rate = 3.56987e-05,
    ent_coef = 0.00238306,
    clip_range = 0.3,
    n_epochs = 5,
    gae_lambda = 0.9,
    max_grad_norm = 2,
    vf_coef = 0.431892,
    policy_kwargs = dict(
                        log_std_init = -2,
                        ortho_init = False,
                        activation_fn = nn.ReLU,
                        net_arch = dict(pi=[256, 256], vf=[256, 256])
                    )
    )


hyperparams = {
    "GPUHumanoid": default_hyperparams
}

And I am calling the train.py script with arguments relevant to hyperparameter tuning with this script:

sys.argv = ["python", "-optimize",
                "--algo", "ppo",
                "--env", "GPUHumanoid",
                "--log-folder", "data/tuning_logs",
                "-n", "50000",
                "--n-trials", "1000",
                "--n-jobs", "2",
                "--sampler", "tpe",
                "--pruner", "median",
                "--env-kwargs", "num_envs:256",
                "--conf-file", "simulation.hyperparam_config"]
train()

The error code I get when I use the above arguments + hyperparameter config is:

/usr/local/lib/python3.10/dist-packages/gymnasium/utils/passive_env_checker.py:189: UserWarning: WARN: The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `<class 'numpy.ndarray'>`
  logger.warn(
too many values to unpack (expected 2)

It seems like its expecting the Env interface, and not VecEnv, but I can see from the source code that Envs are wrapped in a DummyVecEnv after being gym.make()'d.

Is there something I am missing? Any help would be greatly appreciated!

Checklist

@antoinedang antoinedang added the question Further information is requested label Mar 15, 2024
@antoinedang
Copy link
Author

antoinedang commented Mar 15, 2024

Looking at the source code, it seems like it could be done by adding an if/else in

env = make_vec_env(
such that make_vec_env is not called if the environment is already a vec_env, and instead we just set env = make_env(**env_kwargs)? More specifically, replace lines 622-632 with:

if self._hyperparams.get("env_is_vectorized", False):
        env = make_env(num_envs=n_envs, **env_kwargs)
else:
        env = make_vec_env(
            make_env,
            n_envs=n_envs,
            seed=self.seed,
            env_kwargs=env_kwargs,
            monitor_dir=log_dir,
            wrapper_class=self.env_wrapper,
            vec_env_cls=self.vec_env_class,  # type: ignore[arg-type]
            vec_env_kwargs=self.vec_env_kwargs,
            monitor_kwargs=self.monitor_kwargs,
        )

Curious if this might break things down the line, and/or if there is an already built solution I'm missing? (I'd rather not have to integrate the entire rl_zoo3 repo in my project for cleanliness' sake)

@araffin
Copy link
Member

araffin commented Mar 15, 2024

Looking at the source code, it seems like it could be done by adding an if/else in

In your case, the best is probably to fork the RL Zoo to adapt it to your needs (you can still install it as an editable package so you don't have to integrate it in your codebase).
gym.make() is supposed to return a gym.Env, not a VecEnv.

Curious if this might break things down the line, and/or if there is an already built solution I'm missing? (I'd rather not have to integrate the entire rl_zoo3 repo in my project for cleanliness' sake)

We have something similar for a tentative PR with envpool integration: #355

@antoinedang
Copy link
Author

Looking at the source code, it seems like it could be done by adding an if/else in

In your case, the best is probably to fork the RL Zoo to adapt it to your needs (you can still install it as an editable package so you don't have to integrate it in your codebase). gym.make() is supposed to return a gym.Env, not a VecEnv.

Curious if this might break things down the line, and/or if there is an already built solution I'm missing? (I'd rather not have to integrate the entire rl_zoo3 repo in my project for cleanliness' sake)

We have something similar for a tentative PR with envpool integration: #355

Sounds good, thanks for letting me know. I'll fork and make the changes.

In case anyone else has the same question, I'll be updating the code here:
https://github.com/mcgill-robotics/Humanoid-rl-baselines3-zoo

@araffin
Copy link
Member

araffin commented Mar 15, 2024

Maybe, could you do a pr to update the docs?

@araffin araffin added the documentation Improvements or additions to documentation label Mar 15, 2024
@antoinedang
Copy link
Author

antoinedang commented Mar 15, 2024

Maybe, could you do a pr to update the docs?

Sure, however I'm new to the repo so I'm not sure the standards / where to do this. What exactly should I update and with what information? Should I do something along the lines of "If your custom environment implements the Stable Baselines 3 VecEnv class, you will have to update the source code (see issue [....])." in https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/docs/guide/custom_env.rst?

@araffin
Copy link
Member

araffin commented Mar 18, 2024

What exactly should I update and with what information?
https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/docs/guide/custom_env.rst

yes, this file.

with an explanation/link (link to this issue) on what to do when you have VectorEnv that are not gym.Env.

Something like https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#sb3-and-procgenenv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants