-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
Description
Which component has the problem?
CuTe DSL
Bug Report
Credit to @sryap for telling me its dealloc reg.
Describe the bug
My complaint is around error message handling.
I am testing FA cute on Hopper, where I added a hello world cute.printf inside the load method https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/cute/flash_fwd.py#L1777
It fails with
File "python/tvm_ffi/cython/function.pxi", line 923, in tvm_ffi.core.Function.__call__
RuntimeError: CUDA Error: cudaErrorIllegalInstruction
After removing tvm, it gives more details:
File "/home/henrylhtsang/.conda/envs/attn-vis/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/jit_executor.py", line 577, in run_compiled_program
raise DSLCudaRuntimeError(error_code, error_name)
cutlass.base_dsl.common.DSLCudaRuntimeError: DSLCudaRuntimeError: CUDA_ERROR_ILLEGAL_INSTRUCTION (error code: 715)
Error Code: 715
🔍 Additional Context:
- Error name: CUDA_ERROR_ILLEGAL_INSTRUCTION
- Error code: 715
- CUDA_TOOLKIT_PATH: not set
- Target SM ARCH: not set
📊 GPU Information:
- CUDA devices available: 1 (current: <CUdevice 0>)
- Architecture: Hopper (sm_90a)
- Compatible SM archs: sm_90a
- Total Memory: 94.99 GB
Compatibility Check:
❌ Error: Target SM ARCH unknown is not compatible
💡 Please use one of SM ARCHs: sm_90a
The error is similar to that of using cuda-python 13 on an older driver.
Removing warpgroup_reg_dealloc and warpgroup_reg_alloc solves the issue.
Steps/Code to reproduce bug
add cute.printf inside load in FA cute for Hopper.
Expected behavior
Some more insightful error message.
Environment details (please complete the following information):
NA
Additional context
NA