Add afmoe model #42168

alyosha-swamy · 2025-11-12T17:52:07Z

Summary

This PR adds support for the AFMoE (Arcee Foundational Mixture of Experts) model architecture for the upcoming Trinity-Mini and Trinity-Nano releases. AFMoE is a decoder-only transformer model featuring a sparse Mixture of Experts (MoE) approach, combining token-choice routing with shared experts and several architectural innovations for efficient inference and improved performance.

Model Description

AFMoE features the following key architectural components:

Mixture of Experts with Shared Experts: Combines routed experts (activated per-token via learned routing) with always-active shared experts for stable base computation
Token-Choice Routing: Uses sigmoid or softmax-based routing with normalization and scaling for expert selection
Q/K Normalization and Gating: Applies RMSNorm to query and key projections and uses sigmoid gating on attention outputs for improved training stability
Hybrid Attention Patterns: Alternates between sliding window attention and full attention across layers for efficiency with long contexts
Dual Normalization: Uses pre- and post-normalization around both attention and MLP blocks for training stability
Configurable Dense Layers: Allows initial layers to use dense MLPs before transitioning to sparse MoE layers (num_dense_layers)

Implementation Details

Modular implementation leveraging transformers' modular architecture:
- Efficient AfmoeRMSNorm for layer normalization
- AfmoeRotaryEmbedding for positional encoding
- AfmoeAttention class implementing Q/K normalization and output gating
- AfmoeTokenChoiceRouter for expert selection
- AfmoeMoE class implementing shared + routed experts architecture
- AfmoeDecoderLayer integrating attention and MoE blocks with dual normalization

Testing

Added comprehensive test suite following standard transformers test patterns
Tests for core functionality:
- Model initialization and weight loading
- Forward and backward passes
- Attention mechanism (sliding window + full attention patterns)
- MoE routing and expert selection
- RoPE embeddings
- KV cache compatibility
Integration tests with example checkpoints
Verified compatibility with existing transformer infrastructure
Model loading and inference verified with arcee-ai/Trinity-Mini

Documentation

Comprehensive model documentation in docs/source/en/model_doc/afmoe.md
Detailed architecture descriptions and usage examples
All configuration parameters documented with clear descriptions
Example code for both Pipeline and AutoModel usage patterns

ArthurZucker

nice work!

src/transformers/models/afmoe/modular_afmoe.py

tests/models/afmoe/test_modeling_afmoe.py

github-actions · 2025-11-14T14:35:04Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: afmoe, auto

Add AFMoE model support

1ae79d2

alyosha-swamy force-pushed the add_afmoe_model branch 4 times, most recently from 6b08d17 to e3ad5e9 Compare November 12, 2025 19:23

Merge remote-tracking branch 'upstream/main' into add_afmoe_model

3a4280c

alyosha-swamy force-pushed the add_afmoe_model branch from e3ad5e9 to 3a4280c Compare November 12, 2025 19:24

ArthurZucker reviewed Nov 14, 2025

View reviewed changes

Address review feedback for AFMoE implementation

1314162

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add afmoe model #42168

Add afmoe model #42168

alyosha-swamy commented Nov 12, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add afmoe model #42168

Are you sure you want to change the base?

Add afmoe model #42168

Conversation

alyosha-swamy commented Nov 12, 2025

Summary

Model Description

Implementation Details

Testing

Documentation

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants