You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've ran into some really weird behaviour on Nvidia GPUs, it looked as-if Subgroup Shuffles were exchanging values where one or more invocation was at a different place in the SPIR-V code.
Btw slight errata on the blogpost, broadcasts and relative shuffles with constant deltas can be implemented as register snooping, while shuffles with non-constant indices or relative deltas can't and the data exchange needs to happen through some bit of compiler allocated shared/on-chip memory.
Its possible that the compiler messed up the sync on on that bit of memory we don't even know exists or expect since the shuffles are pure SSA without taking any pointers.