Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Things to do to reproduce the evaluation results in hyperparameters optimization #323

Open
5 tasks done
whxru opened this issue Dec 5, 2022 · 4 comments
Open
5 tasks done
Labels
question Further information is requested

Comments

@whxru
Copy link

whxru commented Dec 5, 2022

❓ Question

Hello, thanks for the awesome function of auto-searching for the best hyperparameters. Recently I recorded the hyperparameters reported and tried to train the same algorithm (PPO) with the custom environment (non-deterministic but with the same np.random.seed) under the same seed input for train.py. But the evaluation performance (mean reward: -2.2e6) on the actual training after the same timesteps (57600) is quite different from the performance (mean reward: -1.7e6) reported in the phase of hyperparameter optimization. I would be very grateful if you could point out some possible missing steps to reproduce the performance reported in the hyperparameter optimization phase.

FYI, the custom env can be found at: https://github.com/whxru/rl-baselines3-zoo/blob/master/rl_zoo3/aoi_cbu/env_hybrid.py

Checklist

@whxru whxru added the question Further information is requested label Dec 5, 2022
@araffin
Copy link
Member

araffin commented Dec 5, 2022

is quite different from the performance (mean reward: -1.7e6) reported in the phase of hyperparameter optimization

Probably duplicate of #314 (comment) and #204

@whxru
Copy link
Author

whxru commented Dec 6, 2022

Hi, I found one possible reason might be that in hyperparameters optimization phase the model in trail is not seeded

# We do not seed the trial

but in the actual training phase the model is seeded

May I ask the reason behind this design?

@qgallouedec
Copy link
Collaborator

The optimization evaluate a set of hyperparameters with a single training run. This method is therefore sensitive to the variability of results between runs. Thus, obtaining slightly better results during the optimization than during the evaluation is expected.
By the way, it is neither possible nor desirable to find "the right seed" because it is not a hyperparameter and therefore not intended to be optimized.

@whxru
Copy link
Author

whxru commented Dec 6, 2022

@qgallouedec Thanks for your reply and I totally agree with your opinion. But sometimes reproducing the performance during the optimization is meaningful if it is much better than what you have achieved, e.g., when you are dealing with specific problem instances like large-scale combinatorial optimization problems where the existing heuristic algorithms cannot handle or perform poorly. Under this circumstance, trying more seeds and finding the best one would also help solve the problem instance we are facing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants