[Question] One-time evaluation score is not good indicator, and therefore probably should not be the tuning target? #314

CyclonicDyna · 2022-11-16T02:52:05Z

❓ Question

I tried to train the agent with the tuned parameters from the best trial, but strangely, I found out the training cannot be as good as the reported score in the tuning process (actually, much worse). Anyone met this problem?

My guess for this problem is that the one-time evaluation is very stochastic (question: currently, sb3-rl-zoo is using one-time evaluation score as the target, right?). I think It is not a good measurement of how good the agent is.
Probably we should use the mean of multi-time evaluation scores as the tuning target, or using the mean reward from the rollout. What do you think?

(for mountain car)

Thanks for any ideas!

Checklist

I have checked that there is no similar issue in the repo
I have read the SB3 documentation
I have read the RL Zoo README
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

qgallouedec · 2022-11-16T06:51:34Z

Duplicate #204
Contributions are welcomed 🙂

araffin · 2022-11-16T12:07:43Z

Related to #286 too.

In the meantime, you can also increase the number of evaluations episodes to reduce the noise (--eval-episodes).

My guess for this problem is that the one-time evaluation is very stochastic (question: currently, sb3-rl-zoo is using one-time evaluation score as the target, right?).

not really, if you are using a pruner, it evaluates the agent periodically on several test episodes, but yes we current test only one seed.
To have hyperparameters that work for many random seeds, you can do a post-processing step (see #151) or run multiple training (#204).

CyclonicDyna · 2022-11-16T13:35:35Z

Thank you very much @araffin @qgallouedec. I learned a lot from your answers and from the related Q&A : )

CyclonicDyna added the question Further information is requested label Nov 16, 2022

qgallouedec added the duplicate This issue or pull request already exists label Nov 16, 2022

CyclonicDyna closed this as completed Nov 16, 2022

araffin mentioned this issue Dec 5, 2022

[Question] Things to do to reproduce the evaluation results in hyperparameters optimization #323

Open

5 tasks

araffin mentioned this issue Jun 7, 2024

[Question] Results vastly different for an agent created with Stable Baselines3 using hyperparameters optimized in RL Baselines3 Zoo. #458

Open

5 tasks

araffin mentioned this issue Jul 31, 2024

[Feature Request] same random seed for every env in AsyncEval Stable-Baselines-Team/stable-baselines3-contrib#253

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] One-time evaluation score is not good indicator, and therefore probably should not be the tuning target? #314

[Question] One-time evaluation score is not good indicator, and therefore probably should not be the tuning target? #314

CyclonicDyna commented Nov 16, 2022 •

edited

Loading

qgallouedec commented Nov 16, 2022

araffin commented Nov 16, 2022

CyclonicDyna commented Nov 16, 2022

[Question] One-time evaluation score is not good indicator, and therefore probably should not be the tuning target? #314

[Question] One-time evaluation score is not good indicator, and therefore probably should not be the tuning target? #314

Comments

CyclonicDyna commented Nov 16, 2022 • edited Loading

❓ Question

Checklist

qgallouedec commented Nov 16, 2022

araffin commented Nov 16, 2022

CyclonicDyna commented Nov 16, 2022

CyclonicDyna commented Nov 16, 2022 •

edited

Loading