Skip to content

Conversation

@Trenza1ore
Copy link
Contributor

@Trenza1ore Trenza1ore commented Jul 27, 2025

Description

  • Make rollout buffers behave like replay buffers in terms of storage dtype
    • Get observation & action space dtypes in BaseBuffer.__init__
    • Store dtypes in a new BufferDTypes dataclass
    • Use dtypes from the dataclass in all buffers
  • Backward compatibility
    • Cast returned actions to torch.float32 for RolloutBuffer and DictRolloutBuffer to avoid breaking calculations built with torch.float32 in mind
    • Ensure replay buffers saved with old versions of SB3 can be loaded correctly

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Motivation and Context

  • I have raised an issue to propose this change (required for new features and bug fixes)

closes #2162

After inspecting the code in common/buffers.py, I observe the following:

  • Buffers in the RolloutBuffer bloodline always use dtype=np.float32 for all arrays
  • Buffers in the ReplayBuffer bloodline uses dtype from observation & action spaces (gymnasium.spaces.Space objects) for self.observations / self.next_observations / self.actions

This lack of uniformness introduces a few problems:

  • To newcomer, it's somewhat unexpected that for the same size, rollout buffer classes would take 4x memory compared to replay for a common Gym environment with np.uint8 observations
  • It's confusing when try to read & extend the code (readability & extendability are two big selling points of SB3)

Checklist

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have opened an associated PR on the SB3-Contrib repository (if necessary)
  • I have opened an associated PR on the RL-Zoo3 repository (if necessary)
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)
  • I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

@Trenza1ore
Copy link
Contributor Author

Trenza1ore commented Aug 4, 2025

@araffin I've removed BufferDType and associated changes, just unified storage dtype of Replay & Rollout buffers. Rollout buffer's actions dtype is casted to np.float32 in _get_samples to avoid breaking old code (including some tests) relying on get method yielding th.float32 tensor for actions.

@araffin araffin changed the title Unify dtype logic for buffers Use proper dtype for RolloutBuffer Aug 4, 2025
@araffin araffin changed the title Use proper dtype for RolloutBuffer Use proper dtype for RolloutBuffer storage Aug 4, 2025
Copy link
Member

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks =)

@araffin araffin merged commit dd7f5bf into DLR-RM:master Aug 5, 2025
7 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Unify the dtype decision logic for all buffer classes

2 participants