Skip to content

[Question] Things to do to reproduce the evaluation results in hyperparameters optimization #323

Open
@whxru

Description

@whxru

❓ Question

Hello, thanks for the awesome function of auto-searching for the best hyperparameters. Recently I recorded the hyperparameters reported and tried to train the same algorithm (PPO) with the custom environment (non-deterministic but with the same np.random.seed) under the same seed input for train.py. But the evaluation performance (mean reward: -2.2e6) on the actual training after the same timesteps (57600) is quite different from the performance (mean reward: -1.7e6) reported in the phase of hyperparameter optimization. I would be very grateful if you could point out some possible missing steps to reproduce the performance reported in the hyperparameter optimization phase.

FYI, the custom env can be found at: https://github.com/whxru/rl-baselines3-zoo/blob/master/rl_zoo3/aoi_cbu/env_hybrid.py

Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions