Potential Priority Update Inaccuracy in PrioritizedReplayBuffer due to Overwriting in Rainbow Implementation

Hi,

I've been looking at the implementation of PrioritizedReplayBuffer, which uses SumSegmentTree. It appears to follow the standard and efficient approach for prioritized experience replay.

I wanted to raise a discussion point regarding the behavior of priority updates when the replay buffer wraps around and overwrites existing data.



**Scenario:**
1. The buffer is full (`self.size == self.capacity`).
2. The sample method is called, returning a batch including a transition `T_old` originally stored at index i (returned in samples["indices"]).
3. Before update_priorities is called for index i, new transitions are added via the add method. Due to the circular buffer logic (`self.pos = (self.pos + 1) % self.capacity`), the data at index i (including T_old) is overwritten by a new transition `T_new`.
4. Later, update_priorities is called with the originally sampled index i and a priority calculated based on the TD error of `T_old`.
5. Inside update_priorities, the line `self.sum_tree.update(idx, priority)` (and similarly for min_tree) updates the priority value stored at the fixed leaf index i in the segment tree.

**Observation:**
Because index i now corresponds to the slot holding `T_new`, the priority update derived from `T_old`'s error is applied to the priority associated with `T_new`. While `T_new` might be relevant, it's not the transition that generated the specific TD error being used for the update.

**Discussion Point:**
This behavior is characteristic of the standard, efficient SumTree PER implementation – it prioritizes O(log N) complexity over guaranteeing that an update signal always matches the exact transition that generated it (if that transition has been overwritten).

While efficient, this behavior means the priority of one transition (`T_new`) is being influenced directly by the error signal from a potentially unrelated, previous transition (`T_old`)

- Code References:
    - sample returns indices: [link](https://github.com/vwxyzjn/cleanrl/blob/dcc289fc6f0bda492fa7360a155262cf826b12a5/cleanrl/rainbow_atari.py#L330)
    - update_priorities directly uses idx: [link](https://github.com/vwxyzjn/cleanrl/blob/dcc289fc6f0bda492fa7360a155262cf826b12a5/cleanrl/rainbow_atari.py#L358)


**Potential Alternatives (with Trade-offs):**
An alternative approach involves using unique IDs for transitions and an ID-to-index map. This allows checking if the transition at the target index still matches the originally sampled ID before applying the update, ensuring accuracy but adding complexity and memory overhead (though retaining O(log N) complexity).  

I would be happy to see your perspectives on this issue, and I wonder if this priority mismatch can introduce bias to the learning procedure or not, because it can affect the sampling step of the replay buffer.

Thank you for this great repository and your support of the RL community.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential Priority Update Inaccuracy in PrioritizedReplayBuffer due to Overwriting in Rainbow Implementation #511

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Potential Priority Update Inaccuracy in PrioritizedReplayBuffer due to Overwriting in Rainbow Implementation #511

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions