Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Batching, Enabling HPU Graphs and FP8 quantization for SD3 Pipeline #1345

Open
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

deepak-gowda-narayana
Copy link

@deepak-gowda-narayana deepak-gowda-narayana commented Sep 19, 2024

Added support and enabled the follow for SD3 Pipeline.

  • HPU Graph integration
  • FP8 quantization
  • Batching

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@mkpatel3-github
Copy link

@dsocek @skavulya need your feedback to review

Copy link
Contributor

@dsocek dsocek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see inline comments and requested changes. Also, make sure there is at least 1 CI test for SD3 with batching added to test_diffusers

Copy link
Contributor

@skavulya skavulya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please add a test for batch sizes to tests /test_diffusers.py

@deepak-gowda-narayana deepak-gowda-narayana changed the title Implementation of Batching and Enabling HPU Graphs Integration for SD3 Pipeline Implementation of Batching, Enabling HPU Graphs and FP8 quantization Integration for SD3 Pipeline Oct 2, 2024
@deepak-gowda-narayana deepak-gowda-narayana changed the title Implementation of Batching, Enabling HPU Graphs and FP8 quantization Integration for SD3 Pipeline Implementation of Batching, Enabling HPU Graphs and FP8 quantization for SD3 Pipeline Oct 2, 2024
@deepak-gowda-narayana
Copy link
Author

Should add example of running FP8 mode in README

Added the example

@deepak-gowda-narayana
Copy link
Author

Looks good. There are some extra spaces in sd3 pipeline file you should run make style to fix.

Fixed

@deepak-gowda-narayana
Copy link
Author

@dsocek Please provide Feedback on the changes

@deepak-gowda-narayana
Copy link
Author

@libinta Request to review the PR and push for merging

@emascarenhas
Copy link
Contributor

@deepak-gowda-narayana ,
Also run fast tests i.e., tests/ci/fast_tests*.sh and the SLOW test_diffusers.py and post summary of results here.
Also make style and fix any errors.

@deepak-gowda-narayana
Copy link
Author

deepak-gowda-narayana commented Oct 22, 2024

@emascarenhas

@deepak-gowda-narayana , Also run fast tests i.e., tests/ci/fast_tests*.sh and the SLOW test_diffusers.py and post summary of results here. Also make style and fix any errors.

Result Summary of fast_tests.sh

python -m pytest tests/test_gaudi_configuration.py tests/test_trainer_distributed.py tests/test_trainer.py tests/test_trainer_seq2seq.py
=============================================================================================== 81 passed, 8 skipped, 39 warnings in 109.40s (0:01:49) ================================================================================================

Result Summary of fast_tests_diffusers.sh

python -m pytest tests/test_diffusers.py
================================================================================================= 113 passed, 47 skipped, 280 warnings in 1229.45s (0:20:29) =================================================================================================

Result Summary of slow_tests_diffusers.sh
================================================================================================================== short test summary info ===================================================================================================================
FAILED tests/test_diffusers.py::GaudiDDPMPipelineTester::test_no_throughput_regression_bf16 - AssertionError: 6.937816172838211 not greater than or equal to 7.287651444971561
======================================================================================= 1 failed, 4 passed, 1 skipped, 154 deselected, 6 warnings in 358.13s (0:05:58) =======================================================================================
make: *** [Makefile:104: slow_tests_diffusers] Error 1

The Throughput in GaudiDDPMPipeline is calculated with the formula
throughput = (end_time - start_time) / batch_size , which gives the output in seconds/sample metric.

The failing test currently compares throughput against a benchmark value, assuming that higher throughput is better. However, the throughput metric is measured in seconds per sample, where lower values indicate better performance. I am working on a fix to correct the throughput calculation in the pipeline and will update the test accordingly

Copy link
Contributor

@imangohari1 imangohari1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepak-gowda-narayana
Thanks for this PR.
We need to work a bit on this:

  • Currently the measure step fails on me with error AttributeError: 'Conv2d' object has no attribute 'custom_name' (details below) on 1.18-524 driver/container. Please reproduce this and see what is causing this.

  • I've added the measure step directly into the README as it is needed. Please apply the provided patch via git am < 0001*

  • The SD3 generation_samples_per_second is zero when num_images_per_prompt is the set to warmupsteps (below). We need to understand why this is happening and fix it.

QUANT_CONFIG=quantization/stable-diffusion-3/measure_config.json \
PT_HPU_WEIGHT_SHARING=0 \
python text_to_image_generation.py \
    --model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers \
    --prompts "Sailing ship painting by Van Gogh" \
    --num_images_per_prompt 10 \
    --batch_size 1 \
    --num_inference_steps 28 \
    --image_save_dir /tmp/stable_diffusion_3_images \
    --scheduler default \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --quant_mode measure
.
.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/devops/sgohari/tests/codes/pr-reviews/1345/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 602, in <module>
    main()
  File "/devops/sgohari/tests/codes/pr-reviews/1345/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 571, in main
    outputs = pipeline(prompt=args.prompts, **kwargs_call)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 553, in __call__
    noise_pred = self.transformer_hpu(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 679, in transformer_hpu
    return self.capture_replay(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 721, in capture_replay
    outputs = self.transformer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1565, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/transformer_sd3.py", line 295, in forward
    hidden_states = self.pos_embed(hidden_states)  # takes care of adding positional embeddings too.
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1565, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/embeddings.py", line 208, in forward
    latent = self.proj(latent)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1606, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py", line 677, in forward_measure
    output = self.orig_mod(input)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1595, in _call_impl
    args_result = hook(self, args)
  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/torch_overwrites.py", line 68, in _pre_fwd_hook
    new_name = module.custom_name
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1732, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'Conv2d' object has no attribute 'custom_name'
python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers --prompts "Sailing ship painting by Van Gogh" --num_images_per_prompt 4 --batch_size 1 --image_save_dir /tmp/stable_diffusion_3_images --scheduler default --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16

[INFO|pipeline_stable_diffusion_3.py:475] 2024-11-14 16:49:03,919 >> 1 prompt(s) received, 4 generation(s) per prompt, 1 sample(s) per batch, 4 total batch(es).
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:19<00:00, 19.78s/it]
[INFO|pipeline_stable_diffusion_3.py:626] 2024-11-14 16:50:23,043 >> Speed metrics: {'generation_runtime': 79.1241, 'generation_samples_per_second': 0.095, 'generation_steps_per_second': 19.008}
11/14/2024 16:50:23 - INFO - __main__ - Saving images in /tmp/stable_diffusion_3_images...
python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers --prompts "Sailing ship painting by Van Gogh" --num_images_per_prompt 3 --batch_size 1 --image_save_dir /tmp/stable_diffusion_3_images --scheduler default --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16


[INFO|pipeline_stable_diffusion_3.py:475] 2024-11-14 16:50:58,236 >> 1 prompt(s) received, 3 generation(s) per prompt, 1 sample(s) per batch, 3 total batch(es).
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:08<00:00, 22.81s/it]
[INFO|pipeline_stable_diffusion_3.py:626] 2024-11-14 16:52:06,668 >> Speed metrics: {'generation_runtime': 68.4306, 'generation_samples_per_second': 0.0, 'generation_steps_per_second': 2.676}
11/14/2024 16:52:06 - INFO - __main__ - Saving images in /tmp/stable_diffusion_3_images..
python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers --prompts "Sailing ship painting by Van Gogh" --num_images_per_prompt 2 --batch_size 1 --image_save_dir /tmp/stable_diffusion_3_images --scheduler default --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16
.
.


[INFO|pipeline_stable_diffusion_3.py:475] 2024-11-14 16:57:09,565 >> 1 prompt(s) received, 2 generation(s) per prompt, 1 sample(s) per batch, 2 total batch(es).
[WARNING|pipeline_stable_diffusion_3.py:480] 2024-11-14 16:57:09,565 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:55<00:00, 27.50s/it]
[INFO|pipeline_stable_diffusion_3.py:626] 2024-11-14 16:58:04,568 >> Speed metrics: {'generation_runtime': 55.0032, 'generation_samples_per_second': 0.046, 'generation_steps_per_second': 2.315}
11/14/2024 16:58:04 - INFO - __main__ - Saving images in /tmp/stable_diffusion_3_images...

Copy link
Contributor

@imangohari1 imangohari1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary patch is included.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not needed @deepak-gowda-narayana
please exclude it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@imangohari1
Copy link
Contributor

AttributeError: 'Conv2d' object has no attribute 'custom_name'

@deepak-gowda-narayana
Can you reproduce the AttributeError: 'Conv2d' object has no attribute 'custom_name' error please?

@deepak-gowda-narayana
Copy link
Author

deepak-gowda-narayana commented Nov 15, 2024

AttributeError: 'Conv2d' object has no attribute 'custom_name'

@deepak-gowda-narayana Can you reproduce the AttributeError: 'Conv2d' object has no attribute 'custom_name' error please?

@imangohari1 This error is while quantization, will check on this. Did not notice the edited comment

@deepak-gowda-narayana
Copy link
Author

deepak-gowda-narayana commented Nov 15, 2024

The SD3 generation_samples_per_second is zero when num_images_per_prompt is the set to warmupsteps (below). We need to understand why this is happening and fix it.

1 prompt(s) received, 3 generation(s) per prompt, 1 sample(s) per batch, 3 total batch(es).
Speed metrics: {'generation_runtime': 68.4306, 'generation_samples_per_second': 0.0, 'generation_steps_per_second': 2.676}

@imangohari1
We see 0 throughput as number of samples was being set to '0' because of incorrect condition used.
The fix for this was to update condition based on which num of samples was calculated for speed_metrics function

Below is the updated results

2024-11-15 20:53:04,295 >> 1 prompt(s) received, 3 generation(s) per prompt, 1 sample(s) per batch, 3 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:47<00:00, 15.67s/it]
[INFO|pipeline_stable_diffusion_3.py:626] 2024-11-15 20:53:51,304 >> Speed metrics: {'generation_runtime': 47.0088, 'generation_samples_per_second': 0.085, 'generation_steps_per_second': 2.376}

@imangohari1
Copy link
Contributor

AttributeError: 'Conv2d' object has no attribute 'custom_name'

@deepak-gowda-narayana Can you reproduce the AttributeError: 'Conv2d' object has no attribute 'custom_name' error please?

@imangohari1 This error is while quantization, will check on this. Did not notice the edited comment

@deepak-gowda-narayana
The details are given here: #1345 (review)

I ran the cmd on both 1.18 and 1.19 driver and it crashes.
Please run the measure cmd and share the results.

image

@deepak-gowda-narayana
Copy link
Author

AttributeError: 'Conv2d' object has no attribute 'custom_name'

@deepak-gowda-narayana Can you reproduce the AttributeError: 'Conv2d' object has no attribute 'custom_name' error please?

@imangohari1 This error is while quantization, will check on this. Did not notice the edited comment

@deepak-gowda-narayana The details are given here: #1345 (review)

I ran the cmd on both 1.18 and 1.19 driver and it crashes. Please run the measure cmd and share the results.

image

Yes I am seeing the same error on my side as well. Will work to resolve this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants