[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 #29708

xyang16 · 2025-11-29T00:49:25Z

Purpose

#28971 got reverted by #29697 because of breaking tests. This PR redo #28971.

https://buildkite.com/vllm/ci/builds/41020/steps/canvas?jid=019ac943-dd5c-4673-97e4-736893ab6035
- Fix test_modular_oai_triton_moe.py by removing big MNK size test to avoid OOM.
https://buildkite.com/vllm/ci/builds/41020/steps/canvas?sid=019ac943-dd87-4ee9-b343-39cd7b152457
- test_olmoe_tp.py test failure should be unrelated, because [LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 #28971 only apply to mxfp4, and I can see it has been failing before.

Test Plan

pytest -s -v tests/kernels/moe/test_modular_oai_triton_moe.py

Test Result

Unit test passed

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@jeejeelee @DarkLight1337 Please take a look. Thanks a lot for reviewing!

Signed-off-by: Xin Yang <[email protected]>

chatgpt-codex-connector · 2025-11-29T00:49:33Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request re-introduces support for FusedMoE LoRA with Triton kernels for mxfp4 quantization, which was previously reverted. The changes are well-structured and mainly involve:

Adding an UnfusedOAITritonExperts class to allow for LoRA injection by separating GEMM, activation, and reduction steps.
Updating the mxfp4 backend selection logic to enable the Triton backend for LoRA when available.
Adding a comprehensive test suite to validate the new unfused Triton kernel against a PyTorch reference implementation.

The changes look solid and align with the goal of modularizing the MoE kernels. I have a couple of suggestions for improving maintainability and robustness.

vllm/model_executor/layers/quantization/mxfp4.py

vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py

tests/kernels/moe/test_modular_oai_triton_moe.py

Co-authored-by: Jee Jee Li <[email protected]> Signed-off-by: Xin Yang <[email protected]>

[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4

d3119e5

Signed-off-by: Xin Yang <[email protected]>

xyang16 requested review from WoosukKwon, jeejeelee, mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners November 29, 2025 00:49

Fix test

b55736c

Signed-off-by: Xin Yang <[email protected]>

mergify bot added the gpt-oss Related to GPT-OSS models label Nov 29, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Nov 29, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Nov 29, 2025

gemini-code-assist bot reviewed Nov 29, 2025

View reviewed changes

vllm/model_executor/layers/quantization/mxfp4.py Show resolved Hide resolved

vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py Show resolved Hide resolved

jeejeelee reviewed Nov 29, 2025

View reviewed changes

tests/kernels/moe/test_modular_oai_triton_moe.py Outdated Show resolved Hide resolved

xyang16 force-pushed the fused_moe_lora_triton branch from 3e5e554 to b55736c Compare November 29, 2025 00:56

Update tests/kernels/moe/test_modular_oai_triton_moe.py

4cbbc93

Co-authored-by: Jee Jee Li <[email protected]> Signed-off-by: Xin Yang <[email protected]>

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 29, 2025

Merge branch 'main' into fused_moe_lora_triton

88df207

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 #29708

[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 #29708

xyang16 commented Nov 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Nov 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 #29708

Are you sure you want to change the base?

[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 #29708

Conversation

xyang16 commented Nov 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot commented Nov 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xyang16 commented Nov 29, 2025 •

edited by github-actions bot

Loading