Add spin_lock_atom_cas_acquire_wait function#2846
Add spin_lock_atom_cas_acquire_wait function#2846aleozlx wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
For NVIDIA#2845 Added spin_lock_atom_cas_acquire_wait function to handle spin lock acquisition with atomic compare-and-swap.
|
This is functional. flashinfer-ai/flashinfer#2171 Raising it as a proposed solution for what we needed when upgrading to nvidia-cutlass-dsl 4.3.1 #2845 Kind regards from FlashInfer & cuDNN :) |
|
acquire wait is not needed. slack Xiao Song and we can schedule a meeting to explain this |
|
the two shot all redue.py fail is related to something else, let's discuss this in the meeting |
|
you can use the new two-shot gemm+ar kernel in cutedsl examples. The one in flashinfer should be an old version. adding something to CuTeDSL wheel package will take some time, so I would recommend you use the new kernel. |
|
sounds good will discuss with you over slack. will learn about the new kernel example and bring action item back to FI |
|
This PR has been labeled |
For #2845
Added spin_lock_atom_cas_acquire_wait function to handle spin lock acquisition with atomic compare-and-swap.