Skip to content

feat: Attention Module Replacement transform passes (ADLS Group 9) #275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 69 commits into from

Conversation

Johnny1882
Copy link
Contributor

@Johnny1882 Johnny1882 commented Mar 25, 2025

Code submission for ADLS Group 9

Description

This pull request implements a comprehensive framework for attention mechanism optimization and several transform passes for transformer-based models. They focus on improving model speed and memory efficiency at inference time while preserving model quality.

Contributions

We implement an extendable transform framework for attention module replacement, and three independent transform passes:

  1. Multi-Head Latent Attention (MLA)
  2. Group Query Attention (GQA)
  3. LoRA Linear Layer

The supported model of transform passes includes:

  • GPT-2
  • Llama

Important Notes

  • Dependency Issues: GPT2 model includes GPT2SpdaAttention module (available in transformers v4.47.1 only) and MLA code have custom kernel implemented, requiring Triton. These dependencies are causing expected failures in the current PR checks.
  • Transformers Version: To successfully run transform passes on the GPT2 model, please set your transformers version to 4.47.1:

Reference

@Johnny1882 Johnny1882 changed the title Attention Module Replacement transform passes ADLS Group 9: Attention Module Replacement transform passes Mar 25, 2025
@Johnny1882 Johnny1882 changed the title ADLS Group 9: Attention Module Replacement transform passes feat: Attention Module Replacement transform passes (ADLS Group 9) Mar 27, 2025
@Aaron-Zhao123
Copy link
Collaborator

Cleaned in #288, so closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants