-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reinterpret
with structs containing uint64_t
or uint8_t
incorrectly throws “T contains fields that cannot be packed into AnyValue” error
#4817
Comments
I’ve also tried having both of these types extend a common interface but I get the same error. This is with the slang->SPV backend. Perhaps it has something to do with me using the scalar struct layout flag? 🤔 |
Looks like it has something to do with structs containing "uint64_t". If I use "uint", then I don't get this compilation error. |
reinterpret
incorrectly throws “cannot be packed into AnyValue” errorreinterpret
with structs containing uint64_t incorrectly throws “T contains fields that cannot be packed into AnyValue” error
Looks like I also get this issue with reinterprets involving uint8_t. |
reinterpret
with structs containing uint64_t incorrectly throws “T contains fields that cannot be packed into AnyValue” errorreinterpret
with structs containing uint64_t
or uint8_t
incorrectly throws “T contains fields that cannot be packed into AnyValue” error
Looking into this a bit, it appears that the logic in the marshalling code supports 16-bit and 32-bit types, but not 8-bit or 64-bit. slang/source/slang/slang-ir-any-value-marshalling.cpp Lines 699 to 702 in b2ca2d5
|
If it's any help, here's the current reproducer I'm using:
|
@csyonghe Fwiw, the wide BVH traversal code I’m writing jumps from 19FPS here:
to about 180FPS by moving to this:
The main difference being, BVH8NodeWide was made to contain only 32-bit types, whereas BVH8Node contains a mix of 32bit and 8bit types. With the BVH8NodeWide, I’m doing shifts and masks to extract the bytes, versus the original BVH8Node format which exposes the quantized bytes more directly via uint8. I’m a bit shocked by how large of a perf delta that is. Is that expected? |
Yes, LoadAligned is known to have big performance benefits. |
@expipiplus1 perhaps since I’ve got a good reproducer here, I should be assigned this issue? It seems like something I can probably fix. |
@natevm we definitely appreciate your contribution! |
@csyonghe Would it be possible to tap into I see some implementations here in glsl.meta.slang: slang/source/slang/glsl.meta.slang Lines 1250 to 1271 in 45e0eee
slang/source/slang/glsl.meta.slang Lines 1192 to 1206 in 45e0eee
I've found these to be much simpler and more efficient than the shifts and masks that are otherwise required for general bitfield manipulation. |
The fallback there is effectively what Slang's doing here when packing uint16_t: slang/source/slang/slang-ir-any-value-marshalling.cpp Lines 294 to 324 in 45e0eee
|
I think we just need to extend the cases in slang-ir-any-value-marshalling.cpp and make sure uint8 is handled properly. It should be simple and clean additions, no need to involve other files or passes. |
To handle uint8 types efficiently for high performance code, the proper solution is to use bitfield insertion and extraction, not shifting and masking. |
For the moment, I'm thinking to fix this issue I'll match what's currently done. It will be very slow, but it will be correct. From there, we should really consider refactoring this code to use more modern bitfield intrinsics. But that can probably go into a separate "performance feature" PR |
If you want to implement that, you will need to turn that glsl.meta.slang to map bitFieldExtract/insert into its own slang IR instruction and implement emit logic to map that instruction to all targets. This is definitely doable but it is more work. I am not sure it "will be very slow", given that this is just simple peeophole optimization and instruction selection if the hardware has more dedicated support for bit field extracts. |
I highly doubt there will be any noticeable performance difference because shifting and masking is the default way when people are writing HLSL and glsl code and it should be handled very well by llvm and any downstream compiler infrastructure. |
Profiling this traversal code written in Slang, using SPIR-V shifts and masks vs direct bitfield insertion and extraction, I can demonstrate substantial performance improvements. In CUDA we do this very frequently, but I'm suspecting higher level shader languages aren't as used to "ninja optimizations" which really do matter for large scale workloads. Actually, let me get some numbers for you to really prove that point. |
OK, then we need to:
|
I find it a bit weird that certain function calls do or do not show in Nsight graphics. This might also be a special case where these ninja level optimizations matter more. There might also be something going awry in the compilation that I'm not able to see. But beyond performance, I also think the more official bitfield insertion / extraction would clean the code up a lot which would reduce tech debt, and also be useful more generally. So, in short, I think it's worth considering looking into.
|
Ok, I have 0->6 done. Working through step 7. Seems straight forward enough. |
@csyonghe could I get some clarifications for 6 and 7? If I follow the compiler logic right, __intrinsic_op() is an attribute that can only be assigned to a declaration, and not a function definition like those currently in glsl.meta.slang When emitting SPIR-V, slang-emit-spirv.cpp catches the KIROp_BitfieldExtract, and then emits the corresponding SPIR-V instruction. I have this working. But we need to ensure this has a proper fallback path for non-SPIRV targets. Since the current bitfieldExtract intrinsic is only available when users supply a flag enabling glsl intrinsics, putting the intrinsic op in glsl.meta.slang causes a regression for other HLSL codes where bit field insertion/extraction are emitted during a reinterpret. Therefore, I’ve moved the intrinsic op for bit fields into core.meta.slang. This seems to have the effect that any C-like codes have access to the intrinsic. But from there I’m a bit confused about what the logic for 6 should be. If I have CLikeSourceEmitter emit a one-to-one “bitfieldExtract(op0, op1, op2)”, then SPIRV correctly picks this up as an intrinsic to convert to SPIRV. But when targeting HLSL, bitfieldExtract has no definition, and I understandably cannot add one because intrinsic_op() prevents me from doing so. So I hit this catch 22 situation. Alternatively I could bring over the function definitions currently in glsl.meta.slang, then rename the function marked as intrinsic_op() to __bitfieldExtract(op0, op1, op2). Then theoretically CLikeSourceEmitter could generate a bitfieldExtract wrapper function call, which contains the fallback logic for targets which are missing the intrinsic (mainly HLSL). But then i’m a bit confused about the interaction between the INST line in slang-ir-inst-defs.h, CLikeSourceEmitter and intrinsic_op(). Does INST map to CLikeSourceEmitter first for SPIR-V targets? Or should I treat CLikeSourceEmitter as the fallback logic when an intrinsic isn’t handled by a more specialized emitter like slang-emit-spirv? I’m assuming I shouldn’t be emitting DXIR in the same way that we emit SPIR. But then in not sure how to add a fallback intrinsic op to HLSL. In some cases, like the custom wave intrinsics that slang adds on top of HLSL, there appears to be a full pass over the code to handle those. But that seems a bit like a heavy handed solution. I could do that too, but I don’t want to if there’s a more trivial solution. |
All SlangIR opcodes needs to be implemented by the emitter to define how they map to target code. For HLSL, because there isn't a corrresponding intrinsic in HLSL, we have to implement the emit logic by emitting bit shift and masking. You can directly implement it in CLikeSourceEmitter for the kIROp_BitfieldExtract opcode using shifting and masking, and that will be the fallback case if any derived SourceEmitter didn't provide an override for that inst emit logic. For GLSL, you want to override the logic to emit the glsl intrinsic. |
I’m finding that with a recent build of Slang, I can no longer use
reinterpret
. Even for very simple cases, I get this error:cannot be packed into AnyValue
I seem to be able to repro with something like:
struct A {
uint64_t data;
}
struct B {
uint64_t data;
}
A a;
B b = reinterpret(a);
The text was updated successfully, but these errors were encountered: