[CausalLM] Implement and Integrate Ernie 4.5 MoE Model #3598

DonghakPark · 2025-12-02T07:04:47Z

Dependency of the PR

None

Commits to be reviewed in this PR

[CausalLM] Implement ERNIE's GLM Style RoPE

[CausalLM] Implement ERNIE's GLM Style RoPE

Implement GLM Sytle RoPE at MHA CORE

**Self evaluation:**
1. Build test:   [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>

[CausalLM] Add ernie to main & meson build

[CausalLM] Add ernie to main & meson build

Add ernie model & Layer to main, meson build

**Self evaluation:**
1. Build test:   [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]

[CausalLM] Implement Erine MoE Layer

[CausalLM] Implement Erine MoE Layer

Implement Ernie MoE Layer
- Shared Expert accum
- static bias add

**Self evaluation:**
1. Build test:   [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>

[CausalLM] Add causallm common properties

[CausalLM] Add causallm common properties <num_shared experts, moe_norm_min>

add causallm common properties
- num_shared_experts
- moe_norm_min

**Self evaluation:**
1. Build test:   [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>

[Application][CausalLM] Implement Ernie 4.5 MoE Model

[Application][CausalLM] Implement Ernie 4.5 MoE Model

Implemnet Ernie 4.5 MoE Model
- ernie's first layer is dense
- ernie has shared expert at each MoE Layer

**Self evaluation:**
1. Build test:   [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>

Summary

This PR implements and integrates the Ernie 4.5 MoE (Mixture of Experts) model into the CausalLM application.
The changes include the implementation of the model structure, the MoE layer, GLM-style RoPE, and build system integration.

Erine 4.5 Models Key Difference btw Qwen

1. Qwen apply RMSNorm before RoPE to Q, K but Ernie don't

2. Ernie Apply GLM Style RoPE

def rotate_half(x):
    """Rotates half the hidden dims of the input."""
    x1 = x[..., 0::2]
    x2 = x[..., 1::2]
    return torch.stack((-x2, x1), dim=-1).flatten(-2)


def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
    """Applies Rotary Position Embedding to the query and key tensors.

    Args:
        q (`torch.Tensor`): The query tensor.
        k (`torch.Tensor`): The key tensor.
        cos (`torch.Tensor`): The cosine part of the rotary embedding.
        sin (`torch.Tensor`): The sine part of the rotary embedding.
        position_ids (`torch.Tensor`, *optional*):
            Deprecated and unused.
        unsqueeze_dim (`int`, *optional*, defaults to 1):
            The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
            sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
            that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
            k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
            cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
            the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
    Returns:
        `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
    """
    # glm rope style (with full dim) and full precision
    original_dtype = q.dtype

    cos = cos.unsqueeze(unsqueeze_dim)
    sin = sin.unsqueeze(unsqueeze_dim)

    # Interleave them instead of usual shape
    cos = cos[..., : cos.shape[-1] // 2].repeat_interleave(2, dim=-1)
    sin = sin[..., : sin.shape[-1] // 2].repeat_interleave(2, dim=-1)

    q_embed = (q.float() * cos) + (rotate_half(q).float() * sin)
    k_embed = (k.float() * cos) + (rotate_half(k).float() * sin)

    return q_embed.to(original_dtype), k_embed.to(original_dtype)

3. Ernie use 2 Shared Expert + 6 TopK Expert (in 21B-A3B Model)

4. Ernie Has e_score_correction_bias & add this bias after softmax of router

router_logits = F.linear(hidden_states.float(), self.weight)
router_logits = F.softmax(router_logits, dim=1, dtype=torch.float)
router_top_value, router_indices = torch.topk(self.moe_statics(router_logits), self.top_k, dim=-1)

Signed-off-by: Donghak PARK [email protected]

Applications/CausalLM/layers/ernie_moe_layer.cpp

Implemnet Ernie 4.5 MoE Model - ernie's first layer is dense - ernie has shared expert at each MoE Layer **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

…rm_min> add causallm common properties - num_shared_experts - moe_norm_min **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

Implement Ernie MoE Layer - Shared Expert accum - static bias add **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

Add ernie model & Layer to main, meson build **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

Implement GLM Sytle RoPE at MHA CORE **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

Avoid Race Condition on eviction experts **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

DonghakPark requested review from EunjuYang, SeoHyungjun, again4you, anyj0527, baek2sm, djeong20, dkjung, gichan-jang, haehun, jaeyun-jung, jihochu, jijoongmoon, leemgs, lhs8928, myungjoo, skykongkong8, songgot and wooksong as code owners December 2, 2025 07:04

github-actions bot added the Need Review label Dec 2, 2025

DonghakPark force-pushed the implement/ernie_4.5 branch 3 times, most recently from 8504676 to 6bdbf2f Compare December 2, 2025 08:00

DonghakPark self-assigned this Dec 2, 2025

skykongkong8 reviewed Dec 3, 2025

View reviewed changes

Applications/CausalLM/layers/ernie_moe_layer.cpp Show resolved Hide resolved

DonghakPark added 5 commits December 4, 2025 13:56

[CausalLM] Implement Erine MoE Layer

9ebcf24

Implement Ernie MoE Layer - Shared Expert accum - static bias add **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

[CausalLM] Add ernie to main & meson build

696470c

Add ernie model & Layer to main, meson build **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

[CausalLM] Implement ERNIE's GLM Style RoPE

4e8a12d

Implement GLM Sytle RoPE at MHA CORE **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

DonghakPark force-pushed the implement/ernie_4.5 branch from 86f0641 to feadea4 Compare December 4, 2025 04:56

[CausalLM] Avoid Race condition on evict experts

40ecec9

Avoid Race Condition on eviction experts **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

DonghakPark force-pushed the implement/ernie_4.5 branch from feadea4 to 40ecec9 Compare December 4, 2025 04:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CausalLM] Implement and Integrate Ernie 4.5 MoE Model #3598

[CausalLM] Implement and Integrate Ernie 4.5 MoE Model #3598

DonghakPark commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CausalLM] Implement and Integrate Ernie 4.5 MoE Model #3598

Are you sure you want to change the base?

[CausalLM] Implement and Integrate Ernie 4.5 MoE Model #3598

Conversation

DonghakPark commented Dec 2, 2025

Dependency of the PR

Commits to be reviewed in this PR

Summary

Erine 4.5 Models Key Difference btw Qwen

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants