Skip to content

Do Subgroup Shuffle Intrinsics require a preceeding OpControlBarrier with scope at least Subgroup to avoid UB? #2548

@devshgraphicsprogramming

Description

We've ran into some really weird behaviour on Nvidia GPUs, it looked as-if Subgroup Shuffles were exchanging values where one or more invocation was at a different place in the SPIR-V code.

Its too long to write up in a GitHub issue, but its described in our blog post
https://graphics-programming.org/blog/subgroup-shuffle-execution-dependency-on-nvidia

Btw slight errata on the blogpost, broadcasts and relative shuffles with constant deltas can be implemented as register snooping, while shuffles with non-constant indices or relative deltas can't and the data exchange needs to happen through some bit of compiler allocated shared/on-chip memory.

Its possible that the compiler messed up the sync on on that bit of memory we don't even know exists or expect since the shuffles are pure SSA without taking any pointers.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions