[DeepSeek MoE] current workstream planning #1125

lessw2020 · 2025-04-21T16:52:05Z

Making an issue to track expected work for DeepSeek experimental:

1 - Integrate DeepGEMM support (contiguous) as an additional inference option - this uses groupwise/blockwise fp8 quantization - completed (#1124)
1A - add triton contigous group gemm (AMD compat) - completed (#1154)
2 - refactor token processing to avoid code duplication. = PR #1127
3 - add proper training loop support - initial working PR landed. (see train_ds_real.py).
4 - need basic unit tests for checkins
5 - review AMD port for Symmetric Memory. (PR merged into PT core, need to verify run on AMD).
6 - finalize which groupGEMM's we want to support long term (torch bf16 + DeepSeek for fp8?). AMD?
updates -
fix for torch.group_gemm hang (#1166) so this has full training support now.
torch.__scaled_mm with wrappers via torchAO and thus fp8 rowwise has been added for ds inference now. (#1142)

7 - implement stats tracking for experts (exflow optimization) and subsequent more efficient expert placement. (initial stats tracking added, but only tracks topk1, needs topk6). Update = initial token tracking in place for topk==1, need to expand to topk==6.

8 - large scale training runs to prove out everything.

lessw2020 · 2025-04-21T22:57:21Z

#2 = PR #1127

lessw2020 added the enhancement New feature or request label Apr 21, 2025

lessw2020 self-assigned this Apr 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DeepSeek MoE] current workstream planning #1125

[DeepSeek MoE] current workstream planning #1125

lessw2020 commented Apr 21, 2025 •

edited

Loading

lessw2020 commented Apr 21, 2025

Uh oh!

[DeepSeek MoE] current workstream planning #1125

[DeepSeek MoE] current workstream planning #1125

Comments

lessw2020 commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

lessw2020 commented Apr 21, 2025

Uh oh!

lessw2020 commented Apr 21, 2025 •

edited

Loading