Skip to content

[Feature]: Support for Diff-Transformer to limit noise in attention calculation @ runtime #9480

@nightflight-dk

Description

@nightflight-dk

🚀 The feature, motivation and pitch

Microsoft Research and Tsinghua University researchers have introduced Differential Transformer (Diff Transformer), a new LLM architecture that improves performance by amplifying attention to relevant context while filtering out noise. Their findings, published in a research paper, show that Diff Transformer outperforms the classic Transformer architecture in various settings. The Diff-Transformer can be applied both during the training phase and to pretrained models. When applied to pretrained models, it can enhance their robustness and accuracy in practical applications like in-context learning and text summarization. Sources below. The feature request here is to examine the application potential at vLLM runtime.

paper: ArXiv
press coverage (October 16th): VentureBeat

Alternatives

N/A

Additional context

github: Diff-Transformer

"
multihead_diffattn.py contains naive implementation of multi-head differential attention.

multihead_flashdiff_1.py contains multi-head differential attention implemented with FlashAttention, for packages that support different qk/v dimensions (e.g., our customized-flash-attention and xformers).

multihead_flashdiff_2.py contains multi-head differential attention implemented with FlashAttention, for packages that do not support different qk/v dimensions (e.g., flash-attention).

Also refer to microsoft/unilm#1633 for another implementation.
"

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestNew feature or requeststaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions