-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early Stopping, Learning rate and noise decay #398
Comments
Quick Review in Code Bases and LiteratureEarly Stopping1. Stable-Baselines3Three possible callbacks: 2. Standard Protocol for Cooperative MARLMeta-analysis on evaluation methodologies of cooperative MARL with proposed recommendations for standardised performance evaluation protocol. Published at NeurIPS 2022. Summary of protocol here. --> Fixed number of training timesteps and episodes. 3. BenchMARL (Meta Research)A proposed framework to deal with the fragmented community standards and reproducibility issues highlighted by the analysis above. Also some competitive environments. Published in Journal of Machine Learning Research 2024.
--> Not implemented, callbacks may be customized. 4. PettingZoo--> Not implemented. SummaryFor comparability and according to community standards no early stopping (as default) Our current implementation of early stopping is closest to the "no model improvement" callback from SB3. However, suitable default values for steps and threshold are unclear. Should be chosen rather conservatively due to instability of environment and training, but can be useful for experimentation or when dealing with time/computational restrictions. So we'd like to keep the options. Future development: could be restructured and generalized as callbacks. |
Action Noise (Decay)1. Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and PerformancePublished by Jakob Hollenstein et al. (2022)
But type of scheduler didn't seem to be relevent. Performance of linear and logistic scheduler were similar. 2.1 Parameter Space Noise for ExplorationPublished by Matthias Plappert et al. (2018)
2.2 OpenAI BaselinesImplementation of Parameter Space Noise for DDPG. Image from OpenAI blog article. 3. NoisyNet: Noisy Networks for ExplorationPublished by Fortunato et al. (2019), also followed by a US Patent (2019, 2024) from DeepMind.
4. PettingZooAction noise decay in tutorials with MARL libraries. 4.1 AgileRL4.2.1 Ray (PettingZoo x DQN):4.2.2 Ray (in total 17 possible choices of exploration with 5 different general schedulers):
SummaryNoisyNet or Parameter Space Noise interesting, but unclear if implementation effort would be justified. Generally, scheduling of decaying action noise should be implemented to improve performance. If early stopping is enabled: due to scheduling according to fixed number of timesteps/episodes, a warning message needs to be generated that results may be improved because noise decay was not fully performed. |
Learning Rate DecaySidenote: Currently Adam optimizer is used - Adam adapts/decays the individual learning rates of parameters automatically. However, an additional scheduling of the learning rate may still improve performance as discussed here. 1.1 Stable-Baselines3
1.2. SB3 Zoo (Training Framework)
2. Ray RL Library Scheduler
3. PyTorch LR Scheduler
4. Learning to Learn Learning-Rate SchedulesSeries of conference papers on learning learning rates. Latest publication on a GreedyLR scheduler that ...
This seems promising but unfortunately no code is provided. It is based on PyTorch's ReduceLROnPlateau scheduler. SummaryLearning rate scheduling can be beneficial - also for Adam optimizer - and reduce runtimes. Current developments show great potential, but implementation would need to derived from publication. PyTorch offers many schedulers out of the box, but only specifically for the learning rate. The SB3 implemetation provides a general scheduler which can be used for learning rate and action noise decay. |
Implement the best practices from multi-agent Rl community and stablebaselines3 into our algorithm. Further analyse similarities between petting zoo multi-agent implementation to current RL implementation of Assume. (https://towardsdatascience.com/multi-agent-deep-reinforcement-learning-in-15-lines-of-code-using-pettingzoo-e0b963c0820b)
The text was updated successfully, but these errors were encountered: