Skip to content

Conversation

@xyang16
Copy link
Contributor

@xyang16 xyang16 commented Nov 29, 2025

Purpose

#28971 got reverted by #29697 because of breaking tests. This PR redo #28971.

Test Plan

pytest -s -v tests/kernels/moe/test_modular_oai_triton_moe.py

Test Result

Unit test passed


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@jeejeelee @DarkLight1337 Please take a look. Thanks a lot for reviewing!

Signed-off-by: Xin Yang <[email protected]>
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request re-introduces support for FusedMoE LoRA with Triton kernels for mxfp4 quantization, which was previously reverted. The changes are well-structured and mainly involve:

  1. Adding an UnfusedOAITritonExperts class to allow for LoRA injection by separating GEMM, activation, and reduction steps.
  2. Updating the mxfp4 backend selection logic to enable the Triton backend for LoRA when available.
  3. Adding a comprehensive test suite to validate the new unfused Triton kernel against a PyTorch reference implementation.

The changes look solid and align with the goal of modularizing the MoE kernels. I have a couple of suggestions for improving maintainability and robustness.

@xyang16 xyang16 force-pushed the fused_moe_lora_triton branch from 3e5e554 to b55736c Compare November 29, 2025 00:56
@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

2 participants