batchsize dim for flashinfer api #927

arthursunbao · 2025-03-10T12:44:22Z

Hi,

I am trying to figure out how to replace the interface from flash attention to flashinfer

In flashattention, the q,k,v in flash_attn_func has the batchsize dimension: q: (batch_size, seqlen, nheads, headdim). However, I found that in flashinfer, it seems that it only has [kv_len, num_kv_heads, head_dim] for both prefill and decode api.

So how can I replace the flash_attn_func to FlashInfer Python API?

Jason

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batchsize dim for flashinfer api #927

batchsize dim for flashinfer api #927

arthursunbao commented Mar 10, 2025

batchsize dim for flashinfer api #927

batchsize dim for flashinfer api #927

Comments

arthursunbao commented Mar 10, 2025