[Operator] optimized flash_mla in triton #510

StrongSpoon · 2025-03-25T03:38:39Z

PR Category

Operator

Type of Change

Performance Optimization

Description

reimplemented flash_mla and got better performance than baseline.

Performance on NVIDIA A100, compared to torch.

And it achieves 78% relative performance to flash_infer (backend=fa2) in average.

Performance on NVIDIA H800, compared to all implementations.

Bandwidth(GB/s) \seqlen	1151	2175	4223	8319	16511	32895
torch	2	2	2	2	2	2
flash_mla	1726	2012	2186	2259	2240	2275
flash_infer	1579	1638	1669	1662	1671	1660
flash_mla_triton	88	94	125	150	173	190
flag_mla_triton	1107	1156	1181	1162	1156	1169

Issue

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

kiddyjinjin · 2025-06-04T01:35:29Z

src/flag_gems/fused/flash_mla.py

+    dv,
+    causal,
+):
+    logging.debug("GEMS FLASH MLA")


change to logger.debug

change to logger.debug

done

benchmark/test_attention_perf.py

kiddyjinjin · 2025-06-23T03:17:15Z

examples/deepseek_with_vllm_test.py

+    # Create a sampling params object.
+    sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=120)
+    model_name = "deepseek-ai/DeepSeek-V3"
+    llm = LLM(


Just a heads-up: the full-version DeepSeek model typically requires multiple GPUs (e.g., 2×H100). Without some non-trivial changes to vLLM or DeepSeek’s config, this script may not run as expected.

kiddyjinjin

lg

StrongSpoon assigned kiddyjinjin Apr 16, 2025

Galaxy1458 force-pushed the flashmla branch from 540f37b to 0ce715d Compare May 28, 2025 07:45

kiddyjinjin reviewed Jun 4, 2025

View reviewed changes

StrongSpoon force-pushed the flashmla branch from 54a9e41 to d087aa0 Compare June 5, 2025 02:21

kiddyjinjin approved these changes Jun 5, 2025

View reviewed changes

kiddyjinjin reviewed Jun 23, 2025

View reviewed changes

benchmark/test_attention_perf.py Show resolved Hide resolved

kiddyjinjin reviewed Jun 23, 2025

View reviewed changes

kiddyjinjin previously approved these changes Jun 23, 2025

View reviewed changes

StrongSpoon added 8 commits June 23, 2025 08:11

[operator] optimzie flash_mla_triton

244788c

[benchmark] perf test for flash_mla

d701eb9

[format] fix pre-commit errors

a563e28

[format]

0db4e4b

[Operator] update configurations of flash_mla for a100

edc5d36

[model] patch flash_mla into vllm

be476b1

reformat

632c332

logger

bb0d4ca

StrongSpoon dismissed kiddyjinjin’s stale review via bb0d4ca June 23, 2025 08:14

StrongSpoon force-pushed the flashmla branch from d087aa0 to bb0d4ca Compare June 23, 2025 08:14

reformat

a470a39

kiddyjinjin previously approved these changes Jul 2, 2025

View reviewed changes

meinie0826 previously approved these changes Jul 8, 2025

View reviewed changes

Merge branch 'master' into flashmla

04f4787

StrongSpoon dismissed stale reviews from meinie0826 and kiddyjinjin via 04f4787 July 8, 2025 09:10

kiddyjinjin approved these changes Jul 9, 2025

View reviewed changes

meinie0826 approved these changes Jul 9, 2025

View reviewed changes

meinie0826 merged commit 73f9236 into master Jul 9, 2025
8 of 14 checks passed

meinie0826 deleted the flashmla branch July 9, 2025 02:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Operator] optimized flash_mla in triton #510

[Operator] optimized flash_mla in triton #510

Uh oh!

StrongSpoon commented Mar 25, 2025 •

edited

Loading

Uh oh!

kiddyjinjin Jun 4, 2025

Uh oh!

StrongSpoon Jun 4, 2025

Uh oh!

Uh oh!

kiddyjinjin Jun 23, 2025

Uh oh!

kiddyjinjin left a comment

Uh oh!

Uh oh!

Uh oh!

[Operator] optimized flash_mla in triton #510

[Operator] optimized flash_mla in triton #510

Uh oh!

Conversation

StrongSpoon commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

Type of Change

Description

Issue

Progress

Performance

Uh oh!

kiddyjinjin Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

StrongSpoon Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kiddyjinjin Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

kiddyjinjin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

StrongSpoon commented Mar 25, 2025 •

edited

Loading