Skip to content

Conversation

@wittlsn
Copy link

@wittlsn wittlsn commented Aug 22, 2025

Description

PPO currently assumes the optimizer is an optax.chain with two elements.
When using a single-transform optimizer (e.g. optax.adam), learning crashes with an IndexError at sbx/ppo/ppo.py:262.

Fixes #77

Motivation and Context

This fix allows PPO to be used with a wider range of Optax optimizers.
Currently, the assumption about optax.chain length unnecessarily restricts optimizer choices.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)
  • I have checked that the documentation builds using make doc (required)

@araffin
Copy link
Owner

araffin commented Aug 27, 2025

Hello,
could you please give a minimal example to reproduce the error?

@wittlsn
Copy link
Author

wittlsn commented Aug 27, 2025

Hello,

I added a minimal example that uses a custom implementation of the PPOPolicy.
The learning fails if an Optax chain with fewer than two elements is used.

@araffin
Copy link
Owner

araffin commented Aug 27, 2025

Sorry, I meant adding it to the description of this PR, not as a test. This should have been done in an issue before creating the PR, in order to understand and discuss the problem (see contributing guide).

@wittlsn
Copy link
Author

wittlsn commented Aug 27, 2025

I’ve now created the issue and linked it to this PR.
Sorry for not following the proper workflow earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] PPO fails with IndexError when using single-transform Optax optimizer

2 participants