You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that a rust gpu shader was running much slower than the equivalent wgsl one.
The wgsl one takes 53ms, and the rust gpu version takes 67ms.
Looking at the SPIRV I tracked part of the issue down to this:
I tried forcing it to not branch but generate a bool, with u32(uvt.x > 0.0 && uvt.y > 0.0 && uvt.z > 0.0 && uvt.x + uvt.y < 1.0) == 1, and while it kept the conversion and the equality check, it still had this same nested branching structure.
I then tried this which got me a lot closer to the wgsl perf (now 58ms):
Is it possible to improve the code generation in rust gpu to avoid the excessive branching in situations like this?
(I'm aware that this could also be written differently to avoid branching, I'm not concerned about this specific impl, but about the code generation in general)
The text was updated successfully, but these errors were encountered:
It seems like the best option currently it do write it like: if (uvt.x > 0.0) & (uvt.y > 0.0) & (uvt.z > 0.0) & (uvt.x + uvt.y < 1.0) { Discussion on discord
I noticed that a rust gpu shader was running much slower than the equivalent wgsl one.
The wgsl one takes 53ms, and the rust gpu version takes 67ms.
Looking at the SPIRV I tracked part of the issue down to this:
I used spirv-cross to look at the code rust-gpu was producing in glsl and noticed it was producing this:
Whereas if I take wgsl through the same path (wgsl -> spirv -> glsl) it looks like this:
I tried forcing it to not branch but generate a bool, with
u32(uvt.x > 0.0 && uvt.y > 0.0 && uvt.z > 0.0 && uvt.x + uvt.y < 1.0) == 1
, and while it kept the conversion and the equality check, it still had this same nested branching structure.I then tried this which got me a lot closer to the wgsl perf (now 58ms):
This actually results in it using mix here:
Is it possible to improve the code generation in rust gpu to avoid the excessive branching in situations like this?
(I'm aware that this could also be written differently to avoid branching, I'm not concerned about this specific impl, but about the code generation in general)
The text was updated successfully, but these errors were encountered: