Skip to content

Fix leaks in asynchronous mode #358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

JuanGonzalezCaminero
Copy link
Contributor

@JuanGonzalezCaminero JuanGonzalezCaminero commented Mar 12, 2025

This fixes the extraction of leaked tracks in Async mode. The previous extraction of leaks didn't work in most cases for two main reasons:

  • The size of the buffer used for extracting the leaked tracks is limited, so if there are too many, the extraction needs to be done in multiple steps. However, there was no mechanism to do this, tracks would get postponed until the next extraction step, but the corresponding events were notified that all leaks had been extracted. This means that whenever the number of leaks exceeded the buffer size the overflow was lost.
  • There wasn't a way to guarantee that all leaks corresponding to the event requesting a flush had been extracted. The tracks are not extracted in order of insertion, and the transport kernels were allowed to insert new tracks into the leak queues in-between transfers of tracks from device to host. This means that we couldn't know when all leaks corresponding to an event had actually been extracted.

With these changes:

  • The leak queues are frozen whenever an event requests a flush of their leaks. We keep two leak queues per particle type, we swap them at the start of the leak extraction, and don't swap them back until the extraction ends and another event requests a flush. This ensures that when we finish transferring the frozen queue, all leaks associated to that event have been extracted.
  • We use a new state machine that doesn't return to the idle state until all leaks have been transferred. The functionality is similar to that of the scoring thread, but instead of running separately from the transport loop, the work is done by the AdePT thread.

The new ExtractStates are the following:

  • Idle: No extraction is in progress
  • ExtractionRequested: An event requested a flush, waiting for transport to finish
    • We move from Idle to ExtractionRequested when an event has requested a flush, and their EventState is HitsFlushed. The number of particles in flight for that event is guaranteed to be 0 at this point
    • We wait for the transport kernels to finish running as they are writing to the same leak queues that we need to transfer
  • TracksNeedTransfer: An event requested a flush, leak buffer on device has tracks to transfer
    • Transport has finished, the leak queues are frozen and we can start transferring tracks
  • PreparingTracks: Tracks are being copied to the staging buffer
    • Tracks are copied from the leak queues to the transfer buffer, the numbers of transferred and remaining tracks are copied to the host.
  • TracksReadyToCopy: Staging buffer is ready to be copied to host
  • CopyingTracks: Tracks are being copied to host
  • TracksOnHost: Some or all the tracks have been transferred from device to host and are waiting in the copy buffer
  • SavingTracks: Tracks are being copied to per-event queues
    • Tracks are copied from the transfer buffer, which can contain tracks for all events, into per-event queues
  • TracksSaved: Tracks have been moved from the copy buffer to their respective per-event queues
    • All tracks are stored in the per-event queues, so the transfer buffer can be re-used. If there are any remaining leaks on GPU the state goes back to TracksNeedTransfer, otherwise the state moves to Idle

extract_state_machine

@JuanGonzalezCaminero JuanGonzalezCaminero added the bug Type: Something isn't working label Mar 12, 2025
@phsft-bot
Copy link

Can one of the admins verify this patch?

agheata pushed a commit that referenced this pull request Mar 17, 2025
This PR introduces a new feature for the async mode:
`/adept/FinishLastNParticlesOnCPU`.

It is not complete yet, as they are currently just killed and not pushed
to the CPU, as this will require #358 to work and be merged.

Still, killing the last N particles can currently be used to assess some
performance bottlenecks with the magnetic field.

Also, a timer is added to the printout of the async printouts.

This allows for the plotting the numbers of flight vs time and not just
only vs iterations, as this severely skews it towards the fast
iterations at the tail:
<img width="637" alt="Screenshot 2025-03-16 at 10 45 45"
src="https://github.com/user-attachments/assets/b09065a3-567f-4e92-8528-95d60ff3b772"
/>
CMakeLists.txt Outdated
@@ -42,6 +42,12 @@ set(CMAKE_CUDA_STANDARD_REQUIRED ${CMAKE_CXX_STANDARD_REQUIRED})
set(CMAKE_CUDA_EXTENSIONS OFF)
set(CMAKE_INCLUDE_DIRECTORIES_PROJECT_BEFORE ON)


# DEBUG
# set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=address -fno-omit-frame-pointer -g")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I interest you in our builds with no-omit-framepointer enabled in the compiler (gcc14fp)?

@JuanGonzalezCaminero JuanGonzalezCaminero force-pushed the async_edep branch 3 times, most recently from e9b1bcc to b7cc64e Compare April 8, 2025 16:34
@JuanGonzalezCaminero JuanGonzalezCaminero marked this pull request as ready for review April 15, 2025 10:22
Copy link
Collaborator

@SeverinDiederichs SeverinDiederichs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you for this big effort! See minor comments below

Comment on lines -26 to +27
file(COPY "${PROJECT_SOURCE_DIR}/examples/data/testEm3.gdml" DESTINATION "${PROJECT_BINARY_DIR}")
set(TESTING_GDML "${PROJECT_BINARY_DIR}/testEm3.gdml")
# file(COPY "${PROJECT_SOURCE_DIR}/examples/data/testEm3.gdml" DESTINATION "${PROJECT_BINARY_DIR}")
# file(COPY "${PROJECT_SOURCE_DIR}/examples/data/testEm3_regions.gdml" DESTINATION "${PROJECT_BINARY_DIR}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a leftover and should be reverted. (the commented out code was fixed and the now removed code is actually correct)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are actually not using the copied files as we give the absolute path in the scripts. I can uncomment these lines and change the path in the scripts too, both things work for me

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, if it is not used, then it can be fully deleted :) I just remember the commenting being a quick fix to get rid of the problem fixed in #379

std::cout << "Run time: " << time << "\n";
G4cout << "Run time: " << time << "\n";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was changed because with G4cout it didn't print the time anymore. Can you confirm it actually prints the time although it is not using std::cout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't remember which one was correct, as both were printing the time. I changed it because I thought it hadn't been updated, will revert

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both work, then no preference from my side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Type: Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants