-
Notifications
You must be signed in to change notification settings - Fork 123
cl_ext_alive_only_barrier #1375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This extension adds a new built-in function to perform barrier synchronization across the work-group even if some of the work-items are not "alive" anymore due to having returned from the kernel.
|
Ping @kpet @bashbaug @Kerilk @karolherbst. |
I wonder if this is necessary and if the OpenCL spec could be relaxed instead here. The |
@karolherbst: interesting. I didn't know SPIR_V changed the barrier semantics in v1.7. I don't see the wording spelled out in the spec explicitly. This is a pretty drastic change, which basically makes v1.7 backwards incompatible with v1.6 for targets which do not implement the "active/alive only" semantics. There could be devices we don't know of where it's (significantly) more expensive to implement. Also vectorizing WGs of kernels with such barriers on CPU/SIMD, especially on non-predicated vector ISAs induces overheads. The cases should be compile-time analyzable though. |
I doubt it's problematic for anything not being a CPU, because the threading model is just entirely different there and compares more to masked/predicated SIMD instructions. But maybe it's best to discuss this at the WG meeting and ask everybody to check if anybody sees any problems with it from a hardware perspective. Would be a bit problematic for CPU implementations, so maybe for those it might make sense to keep it explicit. |
Couple of thoughts and corrections:
If this is correct, we should file an issue to add the right validation rule for Workgroup scope barriers in the OpenCL SPIR-V environment spec as well. |
I assume it's a problem on older ones? |
@bashbaug thanks for the clarifications. I suggest we start with a new built-in and consider converting it to a main spec requirement in the future when there are no more relevant devices where requiring the semantics is a problem. Having the semantics as the default barrier semantics, in case of CPUs/SIMD vectorization it would add a bit of control flow analysis to detect the cases when predication is not needed. I think it's nothing to be too worried about for the most of the cases. How I see this used is for using it only when generating from inputs which might have the semantics in the language (HIP/CUDA). Even in those cases it makes sense to CF-analyze the kernel first to find out if it really needs the semantics. |
This extension adds a new built-in function to perform barrier synchronization across the work-group even if some of the work-items are not "alive" anymore due to having returned from the kernel.