[Question] Things to do to reproduce the evaluation results in hyperparameters optimization #323
Open
5 tasks done
Labels
question
Further information is requested
❓ Question
Hello, thanks for the awesome function of auto-searching for the best hyperparameters. Recently I recorded the hyperparameters reported and tried to train the same algorithm (PPO) with the custom environment (non-deterministic but with the same np.random.seed) under the same seed input for
train.py
. But the evaluation performance (mean reward: -2.2e6) on the actual training after the same timesteps (57600) is quite different from the performance (mean reward: -1.7e6) reported in the phase of hyperparameter optimization. I would be very grateful if you could point out some possible missing steps to reproduce the performance reported in the hyperparameter optimization phase.FYI, the custom env can be found at: https://github.com/whxru/rl-baselines3-zoo/blob/master/rl_zoo3/aoi_cbu/env_hybrid.py
Checklist
The text was updated successfully, but these errors were encountered: