add performance and accuracy eval of flux-1.schnell #3502

vkuzo · 2025-12-17T20:38:14Z

Summary:

Adds performance and accuracy eval for the flux-1.schnell model. This is useful as diffusion models are a major use case for torchao, and before this PR we didn't have reproducible benchmarks for them.

Results, measured on a B200 machine:

experiment	lpips_avg	time_s	speedup
bfloat16 (baseline)	-	1.77	-
float8_rowwise	0.1714	1.54	1.15
mxfp8	0.1747	1.47	1.20
nvfp4	0.3081	1.32	1.34

Details:

For performance, we measure e2e time for single image generation, with torch.compile on and num_inference_steps=4. In future PRs we can tighten this up to align with https://pytorch.org/blog/presenting-flux-fast-making-flux-go-brrr-on-h100s/. For now I did not do any performance debugging.
For accuracy, we measure the LPIPS (https://github.com/richzhang/PerceptualSimilarity) score between the image generated by the baseline (bf16) and quantized model, averaged over the DrawBench (https://huggingface.co/datasets/sayakpaul/drawbench) dataset of 200 prompts.
we start with three supported quantization recipes: float8_rowwise, mxfp8, nvfp4 (because I wrote this on a B200). We can expand to other recipes in future PRs as needed.
for selecting layers for applying quantization to a model, I wrote a basic heuristic (don't quantize embeddings, etc) - this was not validated with any accuracy study or sensitivity analysis.

How to run the e2e script:

// takes ~16 mins using 8 GPUs on a B200
benchmarks/quantization/eval_accuracy_and_perf_of_flux.sh
// full log: https://www.internalfb.com/phabricator/paste/view/P2093514733

Note: the script quality is not ideal, we can improve in future PRs if it proves to be worth our time. The current code is good enough to check in and start reporting metrics.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-12-17T20:38:15Z

Stack from ghstack (oldest at bottom):

-> add performance and accuracy eval of flux-1.schnell #3502

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 25daf59 ghstack-comment-id: 3667066648 Pull-Request: #3502

pytorch-bot · 2025-12-17T20:38:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3502

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm linux.rocm.gpu.gfx942.*.b runners switching providers

✅ You can merge normally! (1 Unrelated Failure)

As of commit 43343c5 with merge base 23a58c0 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (trunk failure)
test/test_low_bit_optim.py::TestFSDP2::test_fsdp2

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b1bb3d2 ghstack-comment-id: 3667066648 Pull-Request: #3502

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 70a7b71 ghstack-comment-id: 3667066648 Pull-Request: #3502

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 551cd15 ghstack-comment-id: 3667066648 Pull-Request: #3502

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 58f5c33 ghstack-comment-id: 3667066648 Pull-Request: #3502

sayakpaul · 2025-12-30T15:56:22Z

This is good enough, actually.

For performance, we measure e2e time for single image generation, with torch.compile on and num_inference_steps=4. In future PRs we can tighten this up to align with https://pytorch.org/blog/presenting-flux-fast-making-flux-go-brrr-on-h100s/. For now I did not do any performance debugging.

We shouldn't need basic performance debugging as the mentioned blog post already did that (such as ensuring no graph-breaks, recompilations, CPU<->GPU syncs, etc.). I think we could add the following context before the inference runs to ensure no graph breaks (as it's simple):
https://github.com/huggingface/diffusers/blob/1cdb8723b85f1b427031e390e0bd0bebfe92454e/tests/models/test_modeling_common.py#L2143C9-L2149C37

We can squeeze out more, but that would probably be intrusive. Also, note that we log the performance benchmarks, too: https://huggingface.co/datasets/diffusers/benchmarks. In the future, it could be great for us to pair up and consolidate this like we have done many times in the past :-)

Update

bcfba9a

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Dec 17, 2025

[wip] flux eval

6a3bc39

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 25daf59 ghstack-comment-id: 3667066648 Pull-Request: #3502

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 17, 2025

Update

31ec4fe

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Dec 19, 2025

[wip] flux eval

5bffd03

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b1bb3d2 ghstack-comment-id: 3667066648 Pull-Request: #3502

vkuzo added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Dec 19, 2025

Update

d0792a1

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Dec 19, 2025

[wip] flux eval

28a94e6

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 70a7b71 ghstack-comment-id: 3667066648 Pull-Request: #3502

Update

7db9995

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Dec 22, 2025

[wip] flux eval

7e330d3

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 551cd15 ghstack-comment-id: 3667066648 Pull-Request: #3502

Update

43343c5

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Dec 22, 2025

[wip] flux eval

70057e3

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 58f5c33 ghstack-comment-id: 3667066648 Pull-Request: #3502

vkuzo changed the title ~~[wip] flux eval~~ add performance and accuracy eval of flux-1.schnell Dec 22, 2025

jerryzh168 requested a review from jainapurva December 23, 2025 00:07

This was referenced Dec 30, 2025

fix torchao quantizer for new torchao versions huggingface/diffusers#12901

Merged

make eval script also handle performance measurement #3473

Merged

jainapurva approved these changes Jan 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add performance and accuracy eval of flux-1.schnell #3502

add performance and accuracy eval of flux-1.schnell #3502

Uh oh!

vkuzo commented Dec 17, 2025 •

edited

Loading

Uh oh!

vkuzo commented Dec 17, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

sayakpaul commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

add performance and accuracy eval of flux-1.schnell #3502

Are you sure you want to change the base?

add performance and accuracy eval of flux-1.schnell #3502

Uh oh!

Conversation

vkuzo commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkuzo commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3502

❗ 1 Active SEVs

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

sayakpaul commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vkuzo commented Dec 17, 2025 •

edited

Loading

vkuzo commented Dec 17, 2025 •

edited

Loading

pytorch-bot bot commented Dec 17, 2025 •

edited

Loading