-
Notifications
You must be signed in to change notification settings - Fork 769
[UR][CUDA] Add opportunistic queue serialize prop, impl for cuda #18443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Reasonable chance this will interact/conflict with #18385 |
yeah I'm going to need to rethink how devices reporting support for different properties looks I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to test this feature?
@@ -12320,6 +12323,9 @@ typedef union ur_exp_launch_property_value_t { | |||
/// [in] non-zero value indicates the amount of work group memory to | |||
/// allocate in bytes | |||
size_t workgroup_mem_size; | |||
/// [in] non-zero value indicates a opportunistic native queue serialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// [in] non-zero value indicates a opportunistic native queue serialized | |
/// [in] non-zero value indicates an opportunistic native queue serialized |
@@ -56,6 +58,10 @@ members: | |||
name: workgroup_mem_size | |||
desc: "[in] non-zero value indicates the amount of work group memory to allocate in bytes" | |||
tag: $X_EXP_LAUNCH_PROPERTY_ID_WORK_GROUP_MEMORY | |||
- type: int | |||
name: opportunistic_queue_serialize | |||
desc: "[in] non-zero value indicates a opportunistic native queue serialized kernel" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
desc: "[in] non-zero value indicates a opportunistic native queue serialized kernel" | |
desc: "[in] non-zero value indicates an opportunistic native queue serialized kernel" |
Makes short kernels that don't need to see the same global memory (or user guarantees global memory writes are complete) launch faster. See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#programmatic-dependent-launch-and-synchronization
Makes lots of short kernels in cutlass great again. cc @FMarno who identified this performance gap.