-
I could use some advice on how to debug a TMA-related issue. I'm debugging a handful of tests that appear to be crashing on cutlass/include/cute/arch/copy_sm90_tma.hpp Lines 631 to 643 in b78588d The problem is that I'm utterly failing at figuring out what makes the kernel crash with an "Illegal instruction".
Curiously enough, when the crash happens, the driver also reports a page fault:
However, I'm having trouble connecting reported fault address 0x7f79_78fff000 to the instruction inputs. As far as I can tell, the data in the registers passed to the instruction is valid. I can't tell if the tensormap data is sensible, as it's not documented. The pointer reported in the driver log also does not seem to match anything passed to the instruction. Any suggestions on what may be the root cause for |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 10 replies
-
Usually this happens when your TMA descriptor is invalid. Need more details to help you debug:
Please give as many details as you can |
Beta Was this translation helpful? Give feedback.
-
I've found the culprit of my TMA troubles. It's the cutlass/include/cutlass/device_kernel.h Lines 41 to 45 in 06e560d Apparently cutlass implicitly assumes that it can pass the pointer to kernels params to TMA. Without |
Beta Was this translation helpful? Give feedback.
-
For what it's worth, support for |
Beta Was this translation helpful? Give feedback.
I've found the culprit of my TMA troubles. It's the
&& !CUTLASS_CLANG_CUDA
here:cutlass/include/cutlass/device_kernel.h
Lines 41 to 45 in 06e560d
Apparently cutlass implicitly assumes that it can pass the pointer to kernels params to TMA. Without
__grid_constant__
params gets copied into local memory and the pointer to it makes TMA unhappy.