-
Notifications
You must be signed in to change notification settings - Fork 630
Open
Labels
FeatureenhancementNew feature or requestNew feature or requesttriageThis issue needs review by the core team.This issue needs review by the core team.
Description
Feature request
Plan to support both 20B and 120B.
Upgrades we need to make on our side to support all recommended optimizations in the HF blog:
- Job configs should use H100 GPUs at least, due to mxfp4 quant
- Need torch 2.7/2.8 (we have 2.6) -- 2.8 is recommended, and was just released today
- Need to upgrade our transformers version to 4.55
- Also need accelerate, kernels, triton 3.4, and triton_kernels
- Make sure we can support adjustable reasoning levels (seems to be set as the system prompt via the chat template)
- If not using remote/vllm/HF inference, need to support Harmony response format.
- Support Flash Attention 3 w/ attention sinks from vLLM. It seems the
attn_implementation
field was generalized to now support pulling in arbitrary kernels from HF Hub; we should add support for this - Ensure it works on vLLM. I get an error right now following their instructions here:
TypeError: flash_attn_varlen_func() got an unexpected keyword argument 's_aux'
Motivation / references
https://huggingface.co/blog/welcome-openai-gpt-oss
#1661
https://openai.com/open-models/
https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers
https://github.com/huggingface/gpt-oss-recipes/blob/main/sft.py
Your contribution
PR
Metadata
Metadata
Assignees
Labels
FeatureenhancementNew feature or requestNew feature or requesttriageThis issue needs review by the core team.This issue needs review by the core team.