Implementation of Batching, Enabling HPU Graphs and FP8 quantization for SD3 Pipeline #1345

deepak-gowda-narayana · 2024-09-19T18:33:06Z

Added support and enabled the follow for SD3 Pipeline.

HPU Graph integration
FP8 quantization
Batching

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Considering number of samples as 1 ( whole batch ) to have perf measure consistent with industry standard for easier comparison

mkpatel3-github · 2024-10-01T19:27:54Z

@dsocek @skavulya need your feedback to review

dsocek

Please see inline comments and requested changes. Also, make sure there is at least 1 CI test for SD3 with batching added to test_diffusers

optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py

skavulya

LGTM. Please add a test for batch sizes to tests /test_diffusers.py

examples/stable-diffusion/quantization/measure_config.json

examples/stable-diffusion/quantization/quant_config.json

deepak-gowda-narayana · 2024-10-07T23:23:19Z

Should add example of running FP8 mode in README

Added the example

deepak-gowda-narayana · 2024-10-08T17:47:48Z

Looks good. There are some extra spaces in sd3 pipeline file you should run make style to fix.

Fixed

deepak-gowda-narayana · 2024-10-08T17:48:53Z

@dsocek Please provide Feedback on the changes

examples/stable-diffusion/README.md

deepak-gowda-narayana · 2024-10-15T20:00:18Z

@libinta Request to review the PR and push for merging

emascarenhas · 2024-10-18T23:45:23Z

@deepak-gowda-narayana ,
Also run fast tests i.e., tests/ci/fast_tests*.sh and the SLOW test_diffusers.py and post summary of results here.
Also make style and fix any errors.

deepak-gowda-narayana · 2024-10-22T03:20:25Z

@emascarenhas

@deepak-gowda-narayana , Also run fast tests i.e., tests/ci/fast_tests*.sh and the SLOW test_diffusers.py and post summary of results here. Also make style and fix any errors.

Result Summary of fast_tests.sh

python -m pytest tests/test_gaudi_configuration.py tests/test_trainer_distributed.py tests/test_trainer.py tests/test_trainer_seq2seq.py
=============================================================================================== 81 passed, 8 skipped, 39 warnings in 109.40s (0:01:49) ================================================================================================

Result Summary of fast_tests_diffusers.sh

python -m pytest tests/test_diffusers.py
================================================================================================= 113 passed, 47 skipped, 280 warnings in 1229.45s (0:20:29) =================================================================================================

Result Summary of slow_tests_diffusers.sh
================================================================================================================== short test summary info ===================================================================================================================
FAILED tests/test_diffusers.py::GaudiDDPMPipelineTester::test_no_throughput_regression_bf16 - AssertionError: 6.937816172838211 not greater than or equal to 7.287651444971561
======================================================================================= 1 failed, 4 passed, 1 skipped, 154 deselected, 6 warnings in 358.13s (0:05:58) =======================================================================================
make: *** [Makefile:104: slow_tests_diffusers] Error 1

The Throughput in GaudiDDPMPipeline is calculated with the formula
throughput = (end_time - start_time) / batch_size , which gives the output in seconds/sample metric.

The failing test currently compares throughput against a benchmark value, assuming that higher throughput is better. However, the throughput metric is measured in seconds per sample, where lower values indicate better performance. I am working on a fix to correct the throughput calculation in the pipeline and will update the test accordingly

imangohari1

@deepak-gowda-narayana
Thanks for this PR.
We need to work a bit on this:

Currently the measure step fails on me with error AttributeError: 'Conv2d' object has no attribute 'custom_name' (details below) on 1.18-524 driver/container. Please reproduce this and see what is causing this.
I've added the measure step directly into the README as it is needed. Please apply the provided patch via git am < 0001*
- 0001-fix-sd-readme-Added-sd3-measure-step.patch
The SD3 generation_samples_per_second is zero when num_images_per_prompt is the set to warmupsteps (below). We need to understand why this is happening and fix it.
- Please refer to this PR on previous fixes similar to this issue. Ig/Diffusers timing #1277
  Thanks.

QUANT_CONFIG=quantization/stable-diffusion-3/measure_config.json \
PT_HPU_WEIGHT_SHARING=0 \
python text_to_image_generation.py \
    --model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers \
    --prompts "Sailing ship painting by Van Gogh" \
    --num_images_per_prompt 10 \
    --batch_size 1 \
    --num_inference_steps 28 \
    --image_save_dir /tmp/stable_diffusion_3_images \
    --scheduler default \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --quant_mode measure
.
.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/devops/sgohari/tests/codes/pr-reviews/1345/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 602, in <module>
    main()
  File "/devops/sgohari/tests/codes/pr-reviews/1345/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 571, in main
    outputs = pipeline(prompt=args.prompts, **kwargs_call)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 553, in __call__
    noise_pred = self.transformer_hpu(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 679, in transformer_hpu
    return self.capture_replay(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 721, in capture_replay
    outputs = self.transformer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1565, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/transformer_sd3.py", line 295, in forward
    hidden_states = self.pos_embed(hidden_states)  # takes care of adding positional embeddings too.
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1565, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/embeddings.py", line 208, in forward
    latent = self.proj(latent)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1606, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py", line 677, in forward_measure
    output = self.orig_mod(input)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1595, in _call_impl
    args_result = hook(self, args)
  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/torch_overwrites.py", line 68, in _pre_fwd_hook
    new_name = module.custom_name
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1732, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'Conv2d' object has no attribute 'custom_name'

python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers --prompts "Sailing ship painting by Van Gogh" --num_images_per_prompt 4 --batch_size 1 --image_save_dir /tmp/stable_diffusion_3_images --scheduler default --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16

[INFO|pipeline_stable_diffusion_3.py:475] 2024-11-14 16:49:03,919 >> 1 prompt(s) received, 4 generation(s) per prompt, 1 sample(s) per batch, 4 total batch(es).
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:19<00:00, 19.78s/it]
[INFO|pipeline_stable_diffusion_3.py:626] 2024-11-14 16:50:23,043 >> Speed metrics: {'generation_runtime': 79.1241, 'generation_samples_per_second': 0.095, 'generation_steps_per_second': 19.008}
11/14/2024 16:50:23 - INFO - __main__ - Saving images in /tmp/stable_diffusion_3_images...

python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers --prompts "Sailing ship painting by Van Gogh" --num_images_per_prompt 3 --batch_size 1 --image_save_dir /tmp/stable_diffusion_3_images --scheduler default --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16


[INFO|pipeline_stable_diffusion_3.py:475] 2024-11-14 16:50:58,236 >> 1 prompt(s) received, 3 generation(s) per prompt, 1 sample(s) per batch, 3 total batch(es).
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:08<00:00, 22.81s/it]
[INFO|pipeline_stable_diffusion_3.py:626] 2024-11-14 16:52:06,668 >> Speed metrics: {'generation_runtime': 68.4306, 'generation_samples_per_second': 0.0, 'generation_steps_per_second': 2.676}
11/14/2024 16:52:06 - INFO - __main__ - Saving images in /tmp/stable_diffusion_3_images..

python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers --prompts "Sailing ship painting by Van Gogh" --num_images_per_prompt 2 --batch_size 1 --image_save_dir /tmp/stable_diffusion_3_images --scheduler default --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16
.
.


[INFO|pipeline_stable_diffusion_3.py:475] 2024-11-14 16:57:09,565 >> 1 prompt(s) received, 2 generation(s) per prompt, 1 sample(s) per batch, 2 total batch(es).
[WARNING|pipeline_stable_diffusion_3.py:480] 2024-11-14 16:57:09,565 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:55<00:00, 27.50s/it]
[INFO|pipeline_stable_diffusion_3.py:626] 2024-11-14 16:58:04,568 >> Speed metrics: {'generation_runtime': 55.0032, 'generation_samples_per_second': 0.046, 'generation_steps_per_second': 2.315}
11/14/2024 16:58:04 - INFO - __main__ - Saving images in /tmp/stable_diffusion_3_images...

imangohari1

unnecessary patch is included.

imangohari1 · 2024-11-15T17:07:52Z

0001-fix-sd-readme-Added-sd3-measure-step.patch

This file is not needed @deepak-gowda-narayana
please exclude it.

imangohari1 · 2024-11-15T17:20:29Z

AttributeError: 'Conv2d' object has no attribute 'custom_name'

@deepak-gowda-narayana
Can you reproduce the AttributeError: 'Conv2d' object has no attribute 'custom_name' error please?

deepak-gowda-narayana · 2024-11-15T17:28:37Z

AttributeError: 'Conv2d' object has no attribute 'custom_name'

@deepak-gowda-narayana Can you reproduce the AttributeError: 'Conv2d' object has no attribute 'custom_name' error please?

@imangohari1 This error is while quantization, will check on this. Did not notice the edited comment

deepak-gowda-narayana · 2024-11-15T21:03:50Z

The SD3 generation_samples_per_second is zero when num_images_per_prompt is the set to warmupsteps (below). We need to understand why this is happening and fix it.

1 prompt(s) received, 3 generation(s) per prompt, 1 sample(s) per batch, 3 total batch(es).
Speed metrics: {'generation_runtime': 68.4306, 'generation_samples_per_second': 0.0, 'generation_steps_per_second': 2.676}

@imangohari1
We see 0 throughput as number of samples was being set to '0' because of incorrect condition used.
The fix for this was to update condition based on which num of samples was calculated for speed_metrics function

Below is the updated results

2024-11-15 20:53:04,295 >> 1 prompt(s) received, 3 generation(s) per prompt, 1 sample(s) per batch, 3 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:47<00:00, 15.67s/it]
[INFO|pipeline_stable_diffusion_3.py:626] 2024-11-15 20:53:51,304 >> Speed metrics: {'generation_runtime': 47.0088, 'generation_samples_per_second': 0.085, 'generation_steps_per_second': 2.376}

imangohari1 · 2024-11-15T21:38:57Z

AttributeError: 'Conv2d' object has no attribute 'custom_name'

@deepak-gowda-narayana Can you reproduce the AttributeError: 'Conv2d' object has no attribute 'custom_name' error please?

@imangohari1 This error is while quantization, will check on this. Did not notice the edited comment

@deepak-gowda-narayana
The details are given here: #1345 (review)

I ran the cmd on both 1.18 and 1.19 driver and it crashes.
Please run the measure cmd and share the results.

deepak-gowda-narayana · 2024-11-15T21:42:45Z

AttributeError: 'Conv2d' object has no attribute 'custom_name'

@deepak-gowda-narayana Can you reproduce the AttributeError: 'Conv2d' object has no attribute 'custom_name' error please?

@imangohari1 This error is while quantization, will check on this. Did not notice the edited comment

@deepak-gowda-narayana The details are given here: #1345 (review)

I ran the cmd on both 1.18 and 1.19 driver and it crashes. Please run the measure cmd and share the results.

Yes I am seeing the same error on my side as well. Will work to resolve this

deepak-gowda-narayana added 3 commits September 17, 2024 12:05

Update pipeline_stable_diffusion_3.py

2f9f53e

Update pipeline_stable_diffusion_3.py

53aae6b

Update pipeline_stable_diffusion_3.py

b343f35

deepak-gowda-narayana requested a review from regisss as a code owner September 19, 2024 18:33

deepak-gowda-narayana added 4 commits September 24, 2024 08:50

Update pipeline_stable_diffusion_3.py

0899f07

Considering number of samples as 1 ( whole batch ) to have perf measure consistent with industry standard for easier comparison

Update pipeline_stable_diffusion_3.py

8c0a1b6

Update pipeline_stable_diffusion_3.py

cae57f7

Merge branch 'huggingface:main' into main

3b24e90

dsocek suggested changes Oct 1, 2024

View reviewed changes

skavulya reviewed Oct 2, 2024

View reviewed changes

deepak-gowda-narayana added 6 commits October 2, 2024 12:48

Update pipeline_stable_diffusion_xl.py

18690d9

Update README.md

5220aab

Create measure_config.json

079c759

Create quant_config.json

32564c4

Update pipeline_stable_diffusion_xl.py

595641c

Update pipeline_stable_diffusion_3.py

3b1dc48

deepak-gowda-narayana changed the title ~~Implementation of Batching and Enabling HPU Graphs Integration for SD3 Pipeline~~ Implementation of Batching, Enabling HPU Graphs and FP8 quantization Integration for SD3 Pipeline Oct 2, 2024

deepak-gowda-narayana changed the title ~~Implementation of Batching, Enabling HPU Graphs and FP8 quantization Integration for SD3 Pipeline~~ Implementation of Batching, Enabling HPU Graphs and FP8 quantization for SD3 Pipeline Oct 2, 2024

Merge branch 'huggingface:main' into main

a372536

dsocek reviewed Oct 4, 2024

View reviewed changes

examples/stable-diffusion/quantization/measure_config.json Outdated Show resolved Hide resolved

examples/stable-diffusion/quantization/quant_config.json Outdated Show resolved Hide resolved

deepak-gowda-narayana and others added 9 commits October 7, 2024 09:43

Update README.md

cb91354

sync with original branch

1645d81

Update README.md

3b00c52

Create measure_config.json

5dd3246

Create quantize_config.json

20b8352

Delete examples/stable-diffusion/quantization/measure_config.json

cfe29ec

Delete examples/stable-diffusion/quantization/quant_config.json

2ee0849

Add quant mode arguments for Stable Diffusion

86f66d0

Merge branch 'huggingface:main' into main

999b527

deepak-gowda-narayana added 2 commits October 7, 2024 16:21

Update README.md

d55f8d4

update correct file name

3f41f45

Merge branch 'huggingface:main' into main

0983284

Run make style

335d4f2

dsocek reviewed Oct 8, 2024

View reviewed changes

examples/stable-diffusion/README.md Outdated Show resolved Hide resolved

Update README.md

eabf4d9

Merge branch 'huggingface:main' into main

63882fb

deepak-gowda-narayana and others added 2 commits October 21, 2024 20:06

Update sample size , add quant_mode argument to pipeline input

0611cb0

make style to format code style

65b5dba

imangohari1 mentioned this pull request Oct 24, 2024

FLUX with diffusers 0.31.0 #1450

Open

deepak-gowda-narayana added 3 commits October 24, 2024 13:23

Merge branch 'huggingface:main' into main

11f3a2b

Merge branch 'huggingface:main' into main

ec736da

Merge branch 'huggingface:main' into main

652829b

imangohari1 suggested changes Nov 14, 2024

View reviewed changes

imangohari1 and others added 2 commits November 15, 2024 16:59

fix(sd-readme): Added sd3 measure step

a9ab1ca

Apply Patch to include quantization measure step in Readme

b9550b7

imangohari1 suggested changes Nov 15, 2024

View reviewed changes

Delete 0001-fix-sd-readme-Added-sd3-measure-step.patch

cf1bb31

update speed metric condition for number of samples

734b15a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Batching, Enabling HPU Graphs and FP8 quantization for SD3 Pipeline #1345

Implementation of Batching, Enabling HPU Graphs and FP8 quantization for SD3 Pipeline #1345

deepak-gowda-narayana commented Sep 19, 2024 •

edited

Loading

mkpatel3-github commented Oct 1, 2024

dsocek left a comment

skavulya left a comment

deepak-gowda-narayana commented Oct 7, 2024

deepak-gowda-narayana commented Oct 8, 2024

deepak-gowda-narayana commented Oct 8, 2024

deepak-gowda-narayana commented Oct 15, 2024

emascarenhas commented Oct 18, 2024

deepak-gowda-narayana commented Oct 22, 2024 •

edited

Loading

imangohari1 left a comment •

edited

Loading

imangohari1 left a comment

imangohari1 Nov 15, 2024

deepak-gowda-narayana Nov 15, 2024

imangohari1 commented Nov 15, 2024

deepak-gowda-narayana commented Nov 15, 2024 •

edited

Loading

deepak-gowda-narayana commented Nov 15, 2024 •

edited

Loading

imangohari1 commented Nov 15, 2024

deepak-gowda-narayana commented Nov 15, 2024

Implementation of Batching, Enabling HPU Graphs and FP8 quantization for SD3 Pipeline #1345

Are you sure you want to change the base?

Implementation of Batching, Enabling HPU Graphs and FP8 quantization for SD3 Pipeline #1345

Conversation

deepak-gowda-narayana commented Sep 19, 2024 • edited Loading

What does this PR do?

Before submitting

mkpatel3-github commented Oct 1, 2024

dsocek left a comment

Choose a reason for hiding this comment

skavulya left a comment

Choose a reason for hiding this comment

deepak-gowda-narayana commented Oct 7, 2024

deepak-gowda-narayana commented Oct 8, 2024

deepak-gowda-narayana commented Oct 8, 2024

deepak-gowda-narayana commented Oct 15, 2024

emascarenhas commented Oct 18, 2024

deepak-gowda-narayana commented Oct 22, 2024 • edited Loading

imangohari1 left a comment • edited Loading

Choose a reason for hiding this comment

imangohari1 left a comment

Choose a reason for hiding this comment

imangohari1 Nov 15, 2024

Choose a reason for hiding this comment

deepak-gowda-narayana Nov 15, 2024

Choose a reason for hiding this comment

imangohari1 commented Nov 15, 2024

deepak-gowda-narayana commented Nov 15, 2024 • edited Loading

deepak-gowda-narayana commented Nov 15, 2024 • edited Loading

imangohari1 commented Nov 15, 2024

deepak-gowda-narayana commented Nov 15, 2024

deepak-gowda-narayana commented Sep 19, 2024 •

edited

Loading

deepak-gowda-narayana commented Oct 22, 2024 •

edited

Loading

imangohari1 left a comment •

edited

Loading

deepak-gowda-narayana commented Nov 15, 2024 •

edited

Loading

deepak-gowda-narayana commented Nov 15, 2024 •

edited

Loading