Skip to content

Fix statebuffer race#669

Open
josephdviviano wants to merge 3 commits intoFarama-Foundation:masterfrom
josephdviviano:fix-statebuffer-race
Open

Fix statebuffer race#669
josephdviviano wants to merge 3 commits intoFarama-Foundation:masterfrom
josephdviviano:fix-statebuffer-race

Conversation

@josephdviviano
Copy link

The test failing in #668 is actually due to a flakey race condition that has failed in previous runs of CI - on master — on Windows, in CI run 20594583659 from December 30.

I think, in unordered mode, write_idx_.fetch_add(1) assigns a circular buffer position, but the actual data write happens after the position is claimed. Another thread can finish its write and increment count_ past the batch boundary, causing collect() to read a position that the first thread hasn't written to yet.

I was able to reproduce on my machine with the following test on the current codebase, but it should pass currently.

@pseudo-rnd-thoughts
Copy link
Member

Thanks for the PR @josephdviviano, I was in the process of completely rewriting the vectorizer to fix this problem in #652 however didn't observe significant throughput increase so that sadly stalled.
We can probably merge the MR PR without this fix but it is now a priority for me

@josephdviviano
Copy link
Author

No problem! Aside from style any changes required?

The sync_env_ids variable was extracted from info but never used,
masking a missing sanity check. Add assertions that sync env IDs
are always [0, 1, ..., N-1] after reset and each step, validating
the baseline the async comparison relies on.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants