Skip to content

Added stream_id to Perfetto annotations #274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Jul 24, 2025

Conversation

ajanicijamd
Copy link
Contributor

@ajanicijamd ajanicijamd commented Jun 27, 2025

rocprofiler-systems Pull Request

Related Issue

  • Closes SWDEV-538070

What type of PR is this? (check all that apply)

  • Bug Fix
  • Cherry Pick
  • Continuous Integration
  • Documentation Update
  • Feature
  • Optimization
  • Refactor
  • Other (please specify)

Technical Details

  • Corelate memory_copy and kernel_dispatch events with their HIP stream_id and add stream_id as an annotation in Perfetto.
  • By default, group memory_copy and kernel_dispatch events in Perfetto output by their stream_id.
  • Add option, with the configuration setting ROCPROFSYS_ROCM_GROUP_BY_QUEUE, to group by HSA queue instead.

Have you added or updated tests to validate functionality?

  • Yes
  • No - does not apply to this PR

Added / Updated documentation?

  • Yes
  • No - does not apply to this PR

Have you updated CHANGELOG?

  • Yes
  • No - does not apply to this PR

@ajanicijamd ajanicijamd marked this pull request as ready for review June 27, 2025 21:01
@ajanicijamd ajanicijamd requested review from jrmadsen and a team as code owners June 27, 2025 21:01
@dgaliffiAMD
Copy link
Collaborator

dgaliffiAMD commented Jul 9, 2025

I see the stream_id for the rocm_kernel_dispatch events. That looks good. I used our transpose example to validate.

  • stream_id seems to be missing from the rocm_memory_copy events. I'm not sure why it's missing there yet.
  • In tool_tracing_buffered, we'll still have to update the track names for the two categories to correspond to the HIP Streams, so they are grouped together. For example, "HIP Activity [0] Stream 1" to group together rocm_memory_copy and rocm_kernel_dispatch events for stream_id 1. I have some sample code on my local branch to show you what I mean.
  • We still need a new configuration option to allow the user to toggle between the two grouping styles.
  • Lastly, we'll need to add some compile-time and runtime-time guards to preserve backwards compatibility. The HIP_STREAM callback was only introduced in ROCPROFILER_VERSIONS >= 700.

@dgaliffiAMD
Copy link
Collaborator

dgaliffiAMD commented Jul 16, 2025

I just merged with amd-staging to resolve a conflict there.
You can also take a look at https://github.com/dgaliffiAMD/rocprofiler-systems/tree/group-by-stream. I added some code to support a ROCPROFSYS_ROCM_GROUP_BY_QUEUE, so we can toggle between grouping by HSA queue or HIP stream. We still need add some documentation and testing around this, though

@ajanicijamd ajanicijamd requested a review from a team as a code owner July 22, 2025 15:57
ajanicijamd and others added 20 commits July 22, 2025 19:50
Based on the `ROCPROFSYS_ROCM_GROUP_BY_QUEUE` setting, group these
traces accordingly in the Perfetto trace.

Signed-off-by: David Galiffi <[email protected]>
…ption.

If it is not supported, we cannot group by HIP stream and must default to grouping by HSA queue
The HIP_STREAM callback was new in ROCPROFILER v0.7.0

Signed-off-by: David Galiffi <[email protected]>
Signed-off-by: David Galiffi <[email protected]>
error: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Werror=c++20-extensions]

Signed-off-by: David Galiffi <[email protected]>
@dgaliffiAMD dgaliffiAMD force-pushed the ajanicij/hip-streams branch from 406f2e2 to 54fb2f0 Compare July 22, 2025 23:52
@dgaliffiAMD
Copy link
Collaborator

Rebasing with amd-staging

@dgaliffiAMD dgaliffiAMD merged commit 4b4a846 into ROCm:amd-staging Jul 24, 2025
45 of 50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants