Skip to content

Comments

Fix synchronization between SwapDeviceBuffers and Transport#401

Merged
agheata merged 2 commits intoapt-sim:masterfrom
SeverinDiederichs:fix_deviceswap_sync
Jun 25, 2025
Merged

Fix synchronization between SwapDeviceBuffers and Transport#401
agheata merged 2 commits intoapt-sim:masterfrom
SeverinDiederichs:fix_deviceswap_sync

Conversation

@SeverinDiederichs
Copy link
Collaborator

@SeverinDiederichs SeverinDiederichs commented Jun 25, 2025

This PR fixes a subtle but important synchronization that was missing between the swap of the device buffers and the transport.

In rare cases, this could lead to the following scenario:

The hit slot counter statistics were copied from the GPU to the CPU, and reset to 0 on the device. Then the swap was executed.

However, after resetting the slot counter statistics and before the swap was executed, the transport already ran again and wrote some steps (then back to the initial position, as the counter was already reset), so some later steps overwrote some initial steps in the buffer.

This race condition broke reproducibility and could result in wrong results.

So far, no more issues with reproducibility are observed, so the CI test is put back in place.

@SeverinDiederichs SeverinDiederichs added the bug Type: Something isn't working label Jun 25, 2025
@phsft-bot
Copy link

Can one of the admins verify this patch?

@agheata agheata merged commit f751a2b into apt-sim:master Jun 25, 2025
3 checks passed
@SeverinDiederichs SeverinDiederichs deleted the fix_deviceswap_sync branch June 25, 2025 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Type: Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants