-
Notifications
You must be signed in to change notification settings - Fork 621
Open
Labels
Description
In vllm Qwen2.5_vl.py i had found this below:
if attn_type != AttentionType.DECODER:
raise NotImplementedError("Encoder self-attention and "
"encoder/decoder cross-attention "
"are not implemented for "
"FlashInferImpl")so I thought FlashInfer could not support Encoder Only attention, but in the doc of Flashinfer ,i had find
causal (bool) – Whether to apply causal mask to the attention matrix. This argument is ignored if mask is provided in [plan()](https://docs.flashinfer.ai/api/prefill.html#flashinfer.prefill.BatchPrefillWithRaggedKVCacheWrapper.plan).in my view, If i set casual to False ,the Attention Process should be the same as flash attention which is Encoder only attention for Vit ,just like
flash_attn_varlen_func(q,
k,
v,
cu_seqlens_q=cu_seqlens,
cu_seqlens_k=cu_seqlens,
max_seqlen_q=max_seqlen,
max_seqlen_k=max_seqlen,
dropout_p=0,
causal=False)please , if any one who might answer this ,can i use FlashInfer for qwen2.5_vl ViT attention?