Description
❓ Question
Hello, thanks for the awesome function of auto-searching for the best hyperparameters. Recently I recorded the hyperparameters reported and tried to train the same algorithm (PPO) with the custom environment (non-deterministic but with the same np.random.seed) under the same seed input for train.py
. But the evaluation performance (mean reward: -2.2e6) on the actual training after the same timesteps (57600) is quite different from the performance (mean reward: -1.7e6) reported in the phase of hyperparameter optimization. I would be very grateful if you could point out some possible missing steps to reproduce the performance reported in the hyperparameter optimization phase.
FYI, the custom env can be found at: https://github.com/whxru/rl-baselines3-zoo/blob/master/rl_zoo3/aoi_cbu/env_hybrid.py
Checklist
- I have checked that there is no similar issue in the repo
- I have read the SB3 documentation
- I have read the RL Zoo README
- If code there is, it is minimal and working
- If code there is, it is formatted using the markdown code blocks for both code and stack traces.