Override SSM_A op for Qwen3 Next to reduce splits #17587

pwilkin · 2025-11-29T01:16:22Z

This massively reduces the number of splits for the Qwen3 Next graph by placing the initial gate tensor on the backend, otherwise it's put on the CPU which recursively poisons all other layers, leading to splits.

pwilkin · 2025-11-29T01:18:49Z

On the test server this improves pp512 t/s from 900 to 1300.

Override SSM_A op for Qwen3 Next to reduce splits

0a60230

pwilkin requested a review from CISC as a code owner November 29, 2025 01:16

loci-dev mentioned this pull request Nov 29, 2025

UPSTREAM PR #17587: Override SSM_A op for Qwen3 Next to reduce splits auroralabs-loci/llama.cpp#357

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Override SSM_A op for Qwen3 Next to reduce splits #17587

Override SSM_A op for Qwen3 Next to reduce splits #17587

pwilkin commented Nov 29, 2025

Uh oh!

pwilkin commented Nov 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Override SSM_A op for Qwen3 Next to reduce splits #17587

Are you sure you want to change the base?

Override SSM_A op for Qwen3 Next to reduce splits #17587

Conversation

pwilkin commented Nov 29, 2025

Uh oh!

pwilkin commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pwilkin commented Nov 29, 2025 •

edited

Loading