Skip to content

[BUG] warpgroup_reg_dealloc and cute.printf causes confusing CUDA_ERROR_ILLEGAL_INSTRUCTION (error code: 715) #2932

@henrylhtsang

Description

@henrylhtsang

Which component has the problem?

CuTe DSL

Bug Report

Credit to @sryap for telling me its dealloc reg.

Describe the bug
My complaint is around error message handling.

I am testing FA cute on Hopper, where I added a hello world cute.printf inside the load method https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/cute/flash_fwd.py#L1777

It fails with

  File "python/tvm_ffi/cython/function.pxi", line 923, in tvm_ffi.core.Function.__call__
RuntimeError: CUDA Error: cudaErrorIllegalInstruction

After removing tvm, it gives more details:

  File "/home/henrylhtsang/.conda/envs/attn-vis/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/jit_executor.py", line 577, in run_compiled_program
    raise DSLCudaRuntimeError(error_code, error_name)
cutlass.base_dsl.common.DSLCudaRuntimeError: DSLCudaRuntimeError: CUDA_ERROR_ILLEGAL_INSTRUCTION (error code: 715) 
 

Error Code: 715

🔍 Additional Context: 
- Error name: CUDA_ERROR_ILLEGAL_INSTRUCTION
- Error code: 715
- CUDA_TOOLKIT_PATH: not set
- Target SM ARCH: not set

📊 GPU Information:
- CUDA devices available: 1 (current: <CUdevice 0>)
- Architecture: Hopper (sm_90a)
- Compatible SM archs: sm_90a
- Total Memory: 94.99 GB

Compatibility Check:
❌ Error: Target SM ARCH unknown is not compatible
💡 Please use one of SM ARCHs: sm_90a

The error is similar to that of using cuda-python 13 on an older driver.

Removing warpgroup_reg_dealloc and warpgroup_reg_alloc solves the issue.

Steps/Code to reproduce bug
add cute.printf inside load in FA cute for Hopper.

Expected behavior
Some more insightful error message.

Environment details (please complete the following information):
NA

Additional context
NA

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions