Open
Description
I'm running the code verbatim but not finding the results which might be expected. For example, running ping_pong_a2c
results in barely any improvement after more than 8,000 runs, while I would expect a good level of accuracy (at least > 0 score) by 5,000 iterations or so based on other people reporting results based on using RL with Atari/Pong.
Is there something I'm missing? Do the hyperparameters need to be tuned rather than run as is?
Thank you for creating the code base.
Metadata
Metadata
Assignees
Labels
No labels