Skip to content

Conversation

@vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Dec 17, 2025

Summary:

Adds performance and accuracy eval for the flux-1.schnell model. This is useful as diffusion models are a major use case for torchao, and before this PR we didn't have reproducible benchmarks for them.

Results, measured on a B200 machine:

experiment lpips_avg time_s speedup
bfloat16 (baseline) - 1.77 -
float8_rowwise 0.1714 1.54 1.15
mxfp8 0.1747 1.47 1.20
nvfp4 0.3081 1.32 1.34

Details:

  • For performance, we measure e2e time for single image generation, with torch.compile on and num_inference_steps=4. In future PRs we can tighten this up to align with https://pytorch.org/blog/presenting-flux-fast-making-flux-go-brrr-on-h100s/. For now I did not do any performance debugging.
  • For accuracy, we measure the LPIPS (https://github.com/richzhang/PerceptualSimilarity) score between the image generated by the baseline (bf16) and quantized model, averaged over the DrawBench (https://huggingface.co/datasets/sayakpaul/drawbench) dataset of 200 prompts.
  • we start with three supported quantization recipes: float8_rowwise, mxfp8, nvfp4 (because I wrote this on a B200). We can expand to other recipes in future PRs as needed.
  • for selecting layers for applying quantization to a model, I wrote a basic heuristic (don't quantize embeddings, etc) - this was not validated with any accuracy study or sensitivity analysis.

How to run the e2e script:

// takes ~16 mins using 8 GPUs on a B200
benchmarks/quantization/eval_accuracy_and_perf_of_flux.sh
// full log: https://www.internalfb.com/phabricator/paste/view/P2093514733

Note: the script quality is not ideal, we can improve in future PRs if it proves to be worth our time. The current code is good enough to check in and start reporting metrics.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@vkuzo
Copy link
Contributor Author

vkuzo commented Dec 17, 2025

Stack from ghstack (oldest at bottom):

vkuzo added a commit that referenced this pull request Dec 17, 2025
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 25daf59
ghstack-comment-id: 3667066648
Pull-Request: #3502
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3502

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (1 Unrelated Failure)

As of commit 43343c5 with merge base 23a58c0 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 17, 2025
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Dec 19, 2025
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: b1bb3d2
ghstack-comment-id: 3667066648
Pull-Request: #3502
@vkuzo vkuzo added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Dec 19, 2025
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Dec 19, 2025
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 70a7b71
ghstack-comment-id: 3667066648
Pull-Request: #3502
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Dec 22, 2025
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 551cd15
ghstack-comment-id: 3667066648
Pull-Request: #3502
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Dec 22, 2025
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 58f5c33
ghstack-comment-id: 3667066648
Pull-Request: #3502
@vkuzo vkuzo changed the title [wip] flux eval add performance and accuracy eval of flux-1.schnell Dec 22, 2025
@sayakpaul
Copy link
Contributor

This is good enough, actually.

For performance, we measure e2e time for single image generation, with torch.compile on and num_inference_steps=4. In future PRs we can tighten this up to align with https://pytorch.org/blog/presenting-flux-fast-making-flux-go-brrr-on-h100s/. For now I did not do any performance debugging.

We shouldn't need basic performance debugging as the mentioned blog post already did that (such as ensuring no graph-breaks, recompilations, CPU<->GPU syncs, etc.). I think we could add the following context before the inference runs to ensure no graph breaks (as it's simple):
https://github.com/huggingface/diffusers/blob/1cdb8723b85f1b427031e390e0bd0bebfe92454e/tests/models/test_modeling_common.py#L2143C9-L2149C37

We can squeeze out more, but that would probably be intrusive. Also, note that we log the performance benchmarks, too: https://huggingface.co/datasets/diffusers/benchmarks. In the future, it could be great for us to pair up and consolidate this like we have done many times in the past :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants