Skip to content

[UR][CUDA] Add opportunistic queue serialize prop, impl for cuda #18443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: sycl
Choose a base branch
from

Conversation

JackAKirk
Copy link
Contributor

@JackAKirk JackAKirk commented May 13, 2025

Makes short kernels that don't need to see the same global memory (or user guarantees global memory writes are complete) launch faster. See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#programmatic-dependent-launch-and-synchronization

Makes lots of short kernels in cutlass great again. cc @FMarno who identified this performance gap.

@JackAKirk JackAKirk requested review from a team as code owners May 13, 2025 13:01
@JackAKirk JackAKirk requested a review from jchlanda May 13, 2025 13:01
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
@kbenzie
Copy link
Contributor

kbenzie commented May 13, 2025

Reasonable chance this will interact/conflict with #18385

@aarongreig
Copy link
Contributor

yeah I'm going to need to rethink how devices reporting support for different properties looks I think

Copy link
Contributor

@jchlanda jchlanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to test this feature?

@@ -12320,6 +12323,9 @@ typedef union ur_exp_launch_property_value_t {
/// [in] non-zero value indicates the amount of work group memory to
/// allocate in bytes
size_t workgroup_mem_size;
/// [in] non-zero value indicates a opportunistic native queue serialized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// [in] non-zero value indicates a opportunistic native queue serialized
/// [in] non-zero value indicates an opportunistic native queue serialized

@@ -56,6 +58,10 @@ members:
name: workgroup_mem_size
desc: "[in] non-zero value indicates the amount of work group memory to allocate in bytes"
tag: $X_EXP_LAUNCH_PROPERTY_ID_WORK_GROUP_MEMORY
- type: int
name: opportunistic_queue_serialize
desc: "[in] non-zero value indicates a opportunistic native queue serialized kernel"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
desc: "[in] non-zero value indicates a opportunistic native queue serialized kernel"
desc: "[in] non-zero value indicates an opportunistic native queue serialized kernel"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants