[BUG] warpgroup_reg_dealloc and cute.printf causes confusing CUDA_ERROR_ILLEGAL_INSTRUCTION (error code: 715)

### Which component has the problem?

CuTe DSL

### Bug Report

Credit to @sryap for telling me its dealloc reg.

**Describe the bug**
My complaint is around **error message handling**.

I am testing FA cute on Hopper, where I added a hello world cute.printf inside the **load** method https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/cute/flash_fwd.py#L1777

It fails with 
```
  File "python/tvm_ffi/cython/function.pxi", line 923, in tvm_ffi.core.Function.__call__
RuntimeError: CUDA Error: cudaErrorIllegalInstruction
```
After removing tvm, it gives more details:
```
  File "/home/henrylhtsang/.conda/envs/attn-vis/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/jit_executor.py", line 577, in run_compiled_program
    raise DSLCudaRuntimeError(error_code, error_name)
cutlass.base_dsl.common.DSLCudaRuntimeError: DSLCudaRuntimeError: CUDA_ERROR_ILLEGAL_INSTRUCTION (error code: 715) 
 

Error Code: 715

🔍 Additional Context: 
- Error name: CUDA_ERROR_ILLEGAL_INSTRUCTION
- Error code: 715
- CUDA_TOOLKIT_PATH: not set
- Target SM ARCH: not set

📊 GPU Information:
- CUDA devices available: 1 (current: <CUdevice 0>)
- Architecture: Hopper (sm_90a)
- Compatible SM archs: sm_90a
- Total Memory: 94.99 GB

Compatibility Check:
❌ Error: Target SM ARCH unknown is not compatible
💡 Please use one of SM ARCHs: sm_90a
```

The error is similar to that of using cuda-python 13 on an older driver.

Removing warpgroup_reg_dealloc and warpgroup_reg_alloc solves the issue.

**Steps/Code to reproduce bug**
add cute.printf inside load in FA cute for Hopper.

**Expected behavior**
Some more insightful error message.

**Environment details (please complete the following information):**
NA

**Additional context**
NA
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] warpgroup_reg_dealloc and cute.printf causes confusing CUDA_ERROR_ILLEGAL_INSTRUCTION (error code: 715) #2932

Which component has the problem?

Bug Report

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] warpgroup_reg_dealloc and cute.printf causes confusing CUDA_ERROR_ILLEGAL_INSTRUCTION (error code: 715) #2932

Description

Which component has the problem?

Bug Report

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions