PPO on continuous actions #77

zaksemenov · 2024-03-27T22:32:35Z

I noticed that in the PPO agent initialization it forces the is_action_continuous=False whereas the PPO algorithm and other libraries implementing PPO allow continuous actions. Can this be added to Pearl as well

https://github.com/facebookresearch/Pearl/blob/main/pearl/policy_learners/sequential_decision_making/ppo.py#L99

The text was updated successfully, but these errors were encountered:

yiwan-rl · 2024-03-28T03:35:36Z

Thanks for your suggestion. We actually plan to add continuous control PPO soon.

kuds · 2024-08-09T16:26:53Z

Following up on this issue. Do you have an ETA for when this feature might be implemented? I would be interested in contributing to this if possible.

rodrigodesalvobraz · 2024-08-13T22:36:02Z

Following up on this issue. Do you have an ETA for when this feature might be implemented? I would be interested in contributing to this if possible.

Continuous action PPO is scheduled for sometime towards the end of the year. If you implement it yourself we would be delighted to accept the contribution!

kuds · 2024-08-14T00:05:55Z

@rodrigodesalvobraz

Awesome! Thanks for the opportunity! I will start working on this and add any development updates/questions to this issue.

rodrigodesalvobraz · 2024-08-15T03:29:05Z

@rodrigodesalvobraz

Awesome! Thanks for the opportunity! I will start working on this and add any development updates/questions to this issue.

BTW, please don't forget to check CONTRIBUTING.md for important information on contributing to the project.

kuds · 2024-08-15T15:26:57Z

@rodrigodesalvobraz

Quick question, did you mean to close this issue, or does Pearl support PPO with continuous action spaces so it can be closed?

rodrigodesalvobraz · 2024-08-15T16:36:36Z

Oops, sorry, I didn't mean to close the issue. Thanks for pointing it out. No, Pearl still does not support PPO with continuous action spaces. Thanks.

kuds · 2024-08-21T19:02:44Z

@rodrigodesalvobraz

Development Update

I have spent the last few days better understanding Pearl and the different modules (replay buffer, policy learner, etc.). I also got PPO for discrete action spaces working in two Gymnasium environments (CartPole-v1 & LunarLander-v2). The implementation of PPO for continuous action spaces is coded, and I am currently troubleshooting some bugs. I plan to be wrapped up with this development in early September.

Next Steps

Finish implementation of PPO for continuous action spaces
Add new unit tests (if needed)
Create a tutorial to demonstrate using PPO in discrete and continuous action spaces

Questions

Should PPO for continuous action spaces be broken into its own file like SAC?
Why do the baseline models use tanh layers between the fully connected layers instead of ReLU? Just preference?

rodrigodesalvobraz · 2024-08-21T20:17:54Z

Good to hear of your progress, @kuds.

Questions

Should PPO for continuous action spaces be broken into its own file like SAC?

Yes, please.

Why do the baseline models use tanh layers between the fully connected layers instead of ReLU? Just preference?

I could only find it being used in VanillaContinuousActorNetwork. Since that is the last layer, it is probably so the output is normalized (ReLU wouldn't guarantee that).

kuds · 2024-09-05T19:37:17Z

@rodrigodesalvobraz

Development Update

I finished working through the bugs for PPO in continuous action spaces. I am cleaning up my changes and adding new unit tests for the ContinuousProximalPolicyOptimization class. I should have the pull request submitted sometime early next week.

Next Steps

Finish implementation of PPO for continuous action spaces
Add new unit tests
Create a tutorial to demonstrate using PPO in discrete and continuous action spaces

Questions

Why does the PPO for discrete action spaces sum the losses for the actor-network instead of taking the mean/average?

pearl/policy_learners/sequential_decision_making/ppo.py on line 131
loss = torch.sum(-torch.min(r_theta * batch.gae, clip * batch.gae))
As part of this implementation, should I normalize the generalized advantage estimation (gae) at the batch level before applying it to the clipped loss?

yiwan-rl · 2024-09-06T15:43:49Z

Hi Kuds, I think sum and mean both work if one uses optimizers that normalize the gradient such as Adam and RMSprop. But mean seems to be better if one uses SGD. Ideally, GAE normalization should be provided as an option and is applied in actor loss computation. Thanks for your work!

rodrigodesalvobraz assigned yiwan-rl Mar 28, 2024

rodrigodesalvobraz added the enhancement New feature or request label Mar 28, 2024

rodrigodesalvobraz closed this as completed Aug 15, 2024

rodrigodesalvobraz reopened this Aug 15, 2024

This was referenced Sep 10, 2024

Adding continuous action space support to PPO #102

Closed

Adding support to PPO for continuous actions spaces #103

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO on continuous actions #77

PPO on continuous actions #77

zaksemenov commented Mar 27, 2024

yiwan-rl commented Mar 28, 2024

kuds commented Aug 9, 2024

rodrigodesalvobraz commented Aug 13, 2024

kuds commented Aug 14, 2024

rodrigodesalvobraz commented Aug 15, 2024

kuds commented Aug 15, 2024

rodrigodesalvobraz commented Aug 15, 2024

kuds commented Aug 21, 2024 •

edited

Loading

rodrigodesalvobraz commented Aug 21, 2024 •

edited

Loading

Questions

kuds commented Sep 5, 2024 •

edited

Loading

yiwan-rl commented Sep 6, 2024

PPO on continuous actions #77

PPO on continuous actions #77

Comments

zaksemenov commented Mar 27, 2024

yiwan-rl commented Mar 28, 2024

kuds commented Aug 9, 2024

rodrigodesalvobraz commented Aug 13, 2024

kuds commented Aug 14, 2024

rodrigodesalvobraz commented Aug 15, 2024

kuds commented Aug 15, 2024

rodrigodesalvobraz commented Aug 15, 2024

kuds commented Aug 21, 2024 • edited Loading

Development Update

Next Steps

Questions

rodrigodesalvobraz commented Aug 21, 2024 • edited Loading

Questions

kuds commented Sep 5, 2024 • edited Loading

Development Update

Next Steps

Questions

yiwan-rl commented Sep 6, 2024

kuds commented Aug 21, 2024 •

edited

Loading

rodrigodesalvobraz commented Aug 21, 2024 •

edited

Loading

kuds commented Sep 5, 2024 •

edited

Loading