Use proper `dtype` for `RolloutBuffer` storage #2163

Trenza1ore · 2025-07-27T12:03:40Z

Description

Make rollout buffers behave like replay buffers in terms of storage dtype
- Get observation & action space dtypes in BaseBuffer.__init__
- Store dtypes in a new BufferDTypes dataclass
- Use dtypes from the dataclass in all buffers
Backward compatibility
- Cast returned actions to torch.float32 for RolloutBuffer and DictRolloutBuffer to avoid breaking calculations built with torch.float32 in mind
- Ensure replay buffers saved with old versions of SB3 can be loaded correctly

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)

closes #2162

After inspecting the code in common/buffers.py, I observe the following:

Buffers in the RolloutBuffer bloodline always use dtype=np.float32 for all arrays
Buffers in the ReplayBuffer bloodline uses dtype from observation & action spaces (gymnasium.spaces.Space objects) for self.observations / self.next_observations / self.actions

This lack of uniformness introduces a few problems:

To newcomer, it's somewhat unexpected that for the same size, rollout buffer classes would take 4x memory compared to replay for a common Gym environment with np.uint8 observations
It's confusing when try to read & extend the code (readability & extendability are two big selling points of SB3)

Checklist

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

…rmalize_obs for rollout buffers

…changes

stable_baselines3/common/buffers.py

…uested

Trenza1ore · 2025-08-04T08:51:59Z

@araffin I've removed BufferDType and associated changes, just unified storage dtype of Replay & Rollout buffers. Rollout buffer's actions dtype is casted to np.float32 in _get_samples to avoid breaking old code (including some tests) relying on get method yielding th.float32 tensor for actions.

araffin

LGTM, thanks =)

This reverts commit 216d757.

This reverts commit d1e5221.

This reverts commit 3511452.

This reverts commit c2d532c.

This reverts commit 88e6b68.

Trenza1ore added 6 commits July 27, 2025 11:36

Initial implementation of dtype-decision logic

182578d

Fixed init logic

b499574

Updated changelog

84c8f39

Added a test

cdfc5f8

Reformatted using make format

ab24e4f

Ensure make type passes

2807085

Trenza1ore mentioned this pull request Jul 27, 2025

[Feature Request] Unify the dtype decision logic for all buffer classes #2162

Closed

2 tasks

Trenza1ore added 10 commits July 27, 2025 13:35

Fixed DictRolloutBuffer dtype assignment

66f8300

Updated to create a BufferDTypes dataclass and updated pytests

2819d0d

Fix type check errors on Github, separate dict_obs and obs, honor _no…

2da14c7

…rmalize_obs for rollout buffers

Revert _normalize_obs calls in rollout buffers

872a7e5

Updated docs

de4eb59

Updated docs

4964671

Added save / load support with backward compatibility

e31b0e4

Cast sampled actions of rollout buffers to float32 to avoid breaking …

77d6ee1

…changes

Fixed pickle loading of BufferDTypes

6658f2b

Use default_factory instead of default for BufferDTypes.dict_obs

a1eaf2e

araffin reviewed Aug 1, 2025

View reviewed changes

stable_baselines3/common/buffers.py Outdated Show resolved Hide resolved

araffin reviewed Aug 1, 2025

View reviewed changes

stable_baselines3/common/buffers.py Outdated Show resolved Hide resolved

Trenza1ore added 2 commits August 1, 2025 17:34

Simplified BufferDTypes and reverted changes on replay buffers as req…

b60074d

…uested

Removed BufferDTypes

15ffb45

Fixed oversight in dictrolloutbuffer dtype

69ad231

araffin changed the title ~~Unify dtype logic for buffers~~ Use proper dtype for RolloutBuffer Aug 4, 2025

araffin added 4 commits August 4, 2025 17:16

Update changelog and version

4357c8f

Remove cast to float32

d1e5221

Update tests

9264123

Remove cast to long

216d757

araffin changed the title ~~Use proper dtype for RolloutBuffer~~ Use proper dtype for RolloutBuffer storage Aug 4, 2025

araffin approved these changes Aug 4, 2025

View reviewed changes

araffin added 8 commits August 4, 2025 18:04

Revert "Remove cast to long"

c2d532c

This reverts commit 216d757.

Revert "Remove cast to float32"

3511452

This reverts commit d1e5221.

Reapply "Remove cast to float32"

065f1d6

This reverts commit 3511452.

Reapply "Remove cast to long"

88e6b68

This reverts commit c2d532c.

Cast int8 to float32 to avoid PyTorch issues (MultiBinary)

0e1b2cf

Revert "Reapply "Remove cast to long""

0ca5017

This reverts commit 88e6b68.

Cast at sample time only

4b0cfc3

Update changelog.rst

c965950

araffin merged commit dd7f5bf into DLR-RM:master Aug 5, 2025
7 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use proper `dtype` for `RolloutBuffer` storage #2163

Use proper `dtype` for `RolloutBuffer` storage #2163

Trenza1ore commented Jul 27, 2025 •

edited by araffin

Loading

Uh oh!

Uh oh!

Uh oh!

Trenza1ore commented Aug 4, 2025 •

edited

Loading

Uh oh!

araffin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use proper dtype for RolloutBuffer storage #2163

Use proper dtype for RolloutBuffer storage #2163

Conversation

Trenza1ore commented Jul 27, 2025 • edited by araffin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Types of changes

Motivation and Context

Checklist

Uh oh!

Uh oh!

Uh oh!

Trenza1ore commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

araffin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use proper `dtype` for `RolloutBuffer` storage #2163

Use proper `dtype` for `RolloutBuffer` storage #2163

Trenza1ore commented Jul 27, 2025 •

edited by araffin

Loading

Trenza1ore commented Aug 4, 2025 •

edited

Loading