Skip to content

Does FlashInfer support ViT attention in Qwen2.5_Vl? #992

@IdeaMeshDyx

Description

@IdeaMeshDyx

In vllm Qwen2.5_vl.py i had found this below:

        if attn_type != AttentionType.DECODER:
            raise NotImplementedError("Encoder self-attention and "
                                      "encoder/decoder cross-attention "
                                      "are not implemented for "
                                      "FlashInferImpl")

so I thought FlashInfer could not support Encoder Only attention, but in the doc of Flashinfer ,i had find

causal (bool) – Whether to apply causal mask to the attention matrix. This argument is ignored if mask is provided in [plan()](https://docs.flashinfer.ai/api/prefill.html#flashinfer.prefill.BatchPrefillWithRaggedKVCacheWrapper.plan).

in my view, If i set casual to False ,the Attention Process should be the same as flash attention which is Encoder only attention for Vit ,just like

flash_attn_varlen_func(q,
                                            k,
                                            v,
                                            cu_seqlens_q=cu_seqlens,
                                            cu_seqlens_k=cu_seqlens,
                                            max_seqlen_q=max_seqlen,
                                            max_seqlen_k=max_seqlen,
                                            dropout_p=0,
                                            causal=False)

please , if any one who might answer this ,can i use FlashInfer for qwen2.5_vl ViT attention?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions