efficient tuning for gpt-oss #9354

Imbernoulli · 2025-10-27T02:46:04Z

What I Changed

I updated three parts to make gpt-oss training run faster and use less memory:

Flash Attention 3
Gradient Checkpointing
Liger Kernel

After this Change, you can train gpt-oss-120b with 60k context len within ~40G mem per GPU.

Environment

You need to do two things:

Install Liger Kernel: You must install this manually from here:
https://github.com/Comet0322/Liger-Kernel
Download Flash Attention 3: If you are in a place with an internet connection, first run the code below to download and cache Flash Attention 3.
```
import torch
from kernels import get_kernel
vllm_flash_attn3 = get_kernel("kernels-community/vllm-flash-attn3")
```

gemini-code-assist · 2025-10-27T02:46:18Z

Summary of Changes

Hello @Imbernoulli, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the training efficiency of gpt-oss models by incorporating several key optimizations. The changes aim to reduce memory footprint and accelerate training speed, making it possible to train larger models with longer context lengths on more constrained hardware resources. The integration of Flash Attention 3, Liger Kernel, and specific gradient checkpointing configurations are central to achieving these performance improvements.

Highlights

Flash Attention 3 Integration: Enabled Flash Attention 3 for gpt_oss models by dynamically loading and registering the vllm-flash-attn3 kernel, significantly improving attention mechanism efficiency.
Liger Kernel Support: Extended Liger Kernel support to gpt_oss models, allowing for specialized kernel optimizations to enhance training performance and memory usage.
Gradient Checkpointing for GPT-OSS: Integrated GptOssMLP as a Z3 leaf module for gpt_oss models, which is a common pattern used in conjunction with gradient checkpointing to reduce memory consumption during training, especially for large models.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Imbernoulli · 2025-10-27T02:46:35Z

An experiment. The loss curve is reasonable.

gemini-code-assist

Code Review

This pull request introduces several optimizations for gpt-oss model training, including support for Flash Attention 3, the Liger Kernel, and compatibility with DeepSpeed ZeRO-3. These changes are well-targeted and should improve performance and memory usage as described. I've provided a few suggestions to enhance code maintainability and address minor issues. A key point of concern is the modification of model_args.flash_attn with a value not present in the AttentionFunction enum, which could introduce type-related bugs.

gemini-code-assist · 2025-10-27T02:48:28Z

src/llamafactory/model/model_utils/attention.py

+        load_and_register_kernel("kernels-community/vllm-flash-attn3")
+        setattr(config, "_attn_implementation", "kernels-community/vllm-flash-attn3")
+        setattr(config, "_attn_implementation_internal", "kernels-community/vllm-flash-attn3")
+        model_args.flash_attn = "kernels-community/vllm-flash-attn3"


The flash_attn attribute of model_args is of type AttentionFunction, which is an enum. You are assigning a string value "kernels-community/vllm-flash-attn3" which is not a member of this enum. While this works here because the function returns early, it can lead to unexpected behavior in other parts of the code that might expect a valid AttentionFunction member. For better type safety and maintainability, please consider adding this new attention type to the AttentionFunction enum in src/llamafactory/extras/constants.py.

src/llamafactory/model/model_utils/attention.py

src/llamafactory/model/model_utils/liger_kernel.py

gemini-code-assist · 2025-10-27T02:48:28Z

src/llamafactory/model/model_utils/moe.py

    if model_type == "qwen3_omni_moe":
        from transformers.models.qwen3_omni_moe.modeling_qwen3_omni_moe import Qwen3OmniMoeThinkerTextSparseMoeBlock
-
+            
        _set_z3_leaf_modules(model, [Qwen3OmniMoeThinkerTextSparseMoeBlock])
-
+
+    if model_type == "gpt_oss":
+        from transformers.models.gpt_oss.modeling_gpt_oss import GptOssMLP
+
+        _set_z3_leaf_modules(model, [GptOssMLP])


There are some unnecessary empty lines and lines with only whitespace. These should be removed for better code formatting and consistency.

Suggested change

if model_type == "qwen3_omni_moe":

from transformers.models.qwen3_omni_moe.modeling_qwen3_omni_moe import Qwen3OmniMoeThinkerTextSparseMoeBlock

_set_z3_leaf_modules(model, [Qwen3OmniMoeThinkerTextSparseMoeBlock])

if model_type == "gpt_oss":

from transformers.models.gpt_oss.modeling_gpt_oss import GptOssMLP

_set_z3_leaf_modules(model, [GptOssMLP])

if model_type == "qwen3_omni_moe":

from transformers.models.qwen3_omni_moe.modeling_qwen3_omni_moe import Qwen3OmniMoeThinkerTextSparseMoeBlock

_set_z3_leaf_modules(model, [Qwen3OmniMoeThinkerTextSparseMoeBlock])

if model_type == "gpt_oss":

from transformers.models.gpt_oss.modeling_gpt_oss import GptOssMLP

_set_z3_leaf_modules(model, [GptOssMLP])

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

efficient tuning for gpt-oss

72ea2db

gemini-code-assist bot reviewed Oct 27, 2025

View reviewed changes

Imbernoulli and others added 2 commits October 27, 2025 11:07

Update src/llamafactory/model/model_utils/liger_kernel.py

08589c6

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update src/llamafactory/model/model_utils/attention.py

6d7977b

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

hiyouga added the pending This problem is yet to be addressed label Nov 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

efficient tuning for gpt-oss #9354

efficient tuning for gpt-oss #9354

Uh oh!

Imbernoulli commented Oct 27, 2025

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Uh oh!

Imbernoulli commented Oct 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

efficient tuning for gpt-oss #9354

Are you sure you want to change the base?

efficient tuning for gpt-oss #9354

Uh oh!

Conversation

Imbernoulli commented Oct 27, 2025

What I Changed

Environment

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Imbernoulli commented Oct 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants