[Enhancement] Multiple model iterations per Optuna trial and mean performance objective #204

seawee1 · 2022-01-25T13:23:48Z

I currently have the problem that, a lot of times, the results Optuna optimization produces are not really too optimal, due to the stochastic nature of RL training. For example, training 3 agents with the same set of hyperparameters can result in 3 completely different learning curves (at least for the environment I'm training on).
Might it make sense to implement the optimization code in way, such that for each trial multiple agents are trained, and the mean or median performance is reported to Optuna instead?

Inside utils/exp_manager.py hyperparameter_optimization, line 713, I saw your comment "# TODO: eval each hyperparams several times to account for noisy evaluation". Is that maybe exactly what you mention there?

I already had a look at the code and thought a little bit about how one might be able to do that. If somebody would be interested I could implement it and issue a pull request!

The text was updated successfully, but these errors were encountered:

seawee1 · 2022-01-25T14:37:53Z

Regarding the duplicate tag (you are probably referring to issue #151 ?) I can definitely see your point, but why not implement it and let the user decide via a configurable training script argument.

If implemented correctly, I also don't see why this would hinder the use of pruners. They could work based on mean/median objective performance of current and past trials.

araffin · 2022-03-30T09:31:24Z

Hello,

sorry for the late reply was on holidays...

Is that maybe exactly what you mention there?

Yes

Regarding the duplicate tag (you are probably referring to issue #151 ?)

yes and that comment:
#151 (comment)

can definitely see your point, but why not implement it and let the user decide via a configurable training script argument.

I would be happy to have a draft PR ;)

You should also know that this exist: #114

If implemented correctly, I also don't see why this would hinder the use of pruners

How do you prune a trial before the end a run if your objective is the mean/median of several runs?

qgallouedec · 2022-03-30T10:14:36Z

I can definitely see your point, but why not implement it and let the user decide via a configurable training script argument.

I agree with this.
Faced with the same problem, I've already implemented a script that roughly does this. If you open a PR, I would be happy to contribute.

How do you prune a trial before the end a run if your objective is the mean/median of several runs?

By training multiple models simultaneously. Something like

# ...
for split in range(n):
    mean_rewards = []
    for model in models:
        model.learn(split_size, reset_num_timesteps=False)
        mean_reward, _ = evaluate_policy(model, eval_env)
        mean_rewards.append(mean_reward)
    median_score = np.median(mean_rewards)
    trial.report(median_score, split*split_size)

I wonder if you can run, say, 50 or so models simultaneously, without having memory problems or anything.

araffin · 2022-03-30T12:00:31Z

If you open a PR, I would be happy to contribute.

Please do =)

By training multiple models simultaneously. Something like

I was afraid of that answer... yes it does work but not for image-based environment and requires beefy machine anyway (for instance for DQN on Atari, a single model may require 40GB of RAM).
We also need to check if the model.learn(reset_num_timesteps=False) works well with schedules.

50 or so models simultaneously, without having memory problems or anything.

I would run only maximum 3-5 models simultaneously, unless the env is very simple and the network small.

qgallouedec · 2022-03-30T13:30:46Z

Please do =)

Let's open a draft PR and continue the discussion there.

seawee1 changed the title ~~[feature request] Multiple model iterations per Optuna trial and mean performance objective~~ [Enhancement] Multiple model iterations per Optuna trial and mean performance objective Jan 25, 2022

araffin added the duplicate This issue or pull request already exists label Jan 25, 2022

araffin removed the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Mar 30, 2022

qgallouedec linked a pull request Mar 30, 2022 that will close this issue

Multiple models for optimization #225

Draft

13 tasks

qgallouedec mentioned this issue Nov 16, 2022

[Question] One-time evaluation score is not good indicator, and therefore probably should not be the tuning target? #314

Closed

5 tasks

araffin mentioned this issue Dec 5, 2022

[Question] Things to do to reproduce the evaluation results in hyperparameters optimization #323

Open

5 tasks

araffin mentioned this issue Jun 7, 2024

[Question] Results vastly different for an agent created with Stable Baselines3 using hyperparameters optimized in RL Baselines3 Zoo. #458

Open

5 tasks

araffin mentioned this issue Jul 31, 2024

[Feature Request] same random seed for every env in AsyncEval Stable-Baselines-Team/stable-baselines3-contrib#253

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Multiple model iterations per Optuna trial and mean performance objective #204

[Enhancement] Multiple model iterations per Optuna trial and mean performance objective #204

seawee1 commented Jan 25, 2022 •

edited

Loading

seawee1 commented Jan 25, 2022 •

edited

Loading

araffin commented Mar 30, 2022

qgallouedec commented Mar 30, 2022

araffin commented Mar 30, 2022

qgallouedec commented Mar 30, 2022

[Enhancement] Multiple model iterations per Optuna trial and mean performance objective #204

[Enhancement] Multiple model iterations per Optuna trial and mean performance objective #204

Comments

seawee1 commented Jan 25, 2022 • edited Loading

seawee1 commented Jan 25, 2022 • edited Loading

araffin commented Mar 30, 2022

qgallouedec commented Mar 30, 2022

araffin commented Mar 30, 2022

qgallouedec commented Mar 30, 2022

seawee1 commented Jan 25, 2022 •

edited

Loading

seawee1 commented Jan 25, 2022 •

edited

Loading