-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of Batching, Enabling HPU Graphs and FP8 quantization for SD3 Pipeline #1345
base: main
Are you sure you want to change the base?
Implementation of Batching, Enabling HPU Graphs and FP8 quantization for SD3 Pipeline #1345
Conversation
Considering number of samples as 1 ( whole batch ) to have perf measure consistent with industry standard for easier comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see inline comments and requested changes. Also, make sure there is at least 1 CI test for SD3 with batching added to test_diffusers
optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py
Outdated
Show resolved
Hide resolved
optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py
Outdated
Show resolved
Hide resolved
optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py
Outdated
Show resolved
Hide resolved
optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py
Outdated
Show resolved
Hide resolved
optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py
Outdated
Show resolved
Hide resolved
optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please add a test for batch sizes to tests /test_diffusers.py
Added the example |
Fixed |
@dsocek Please provide Feedback on the changes |
@libinta Request to review the PR and push for merging |
@deepak-gowda-narayana , |
Result Summary of fast_tests.sh python -m pytest tests/test_gaudi_configuration.py tests/test_trainer_distributed.py tests/test_trainer.py tests/test_trainer_seq2seq.py Result Summary of fast_tests_diffusers.sh python -m pytest tests/test_diffusers.py Result Summary of slow_tests_diffusers.sh The Throughput in GaudiDDPMPipeline is calculated with the formula The failing test currently compares throughput against a benchmark value, assuming that higher throughput is better. However, the throughput metric is measured in seconds per sample, where lower values indicate better performance. I am working on a fix to correct the throughput calculation in the pipeline and will update the test accordingly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deepak-gowda-narayana
Thanks for this PR.
We need to work a bit on this:
-
Currently the measure step fails on me with error
AttributeError: 'Conv2d' object has no attribute 'custom_name'
(details below) on 1.18-524 driver/container. Please reproduce this and see what is causing this. -
I've added the
measure
step directly into the README as it is needed. Please apply the provided patch viagit am < 0001*
-
The SD3
generation_samples_per_second
is zero whennum_images_per_prompt
is the set towarmupsteps
(below). We need to understand why this is happening and fix it.- Please refer to this PR on previous fixes similar to this issue. Ig/Diffusers timing #1277
Thanks.
- Please refer to this PR on previous fixes similar to this issue. Ig/Diffusers timing #1277
QUANT_CONFIG=quantization/stable-diffusion-3/measure_config.json \
PT_HPU_WEIGHT_SHARING=0 \
python text_to_image_generation.py \
--model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers \
--prompts "Sailing ship painting by Van Gogh" \
--num_images_per_prompt 10 \
--batch_size 1 \
--num_inference_steps 28 \
--image_save_dir /tmp/stable_diffusion_3_images \
--scheduler default \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16 \
--quant_mode measure
.
.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/devops/sgohari/tests/codes/pr-reviews/1345/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 602, in <module>
main()
File "/devops/sgohari/tests/codes/pr-reviews/1345/optimum-habana/examples/stable-diffusion/text_to_image_generation.py", line 571, in main
outputs = pipeline(prompt=args.prompts, **kwargs_call)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 553, in __call__
noise_pred = self.transformer_hpu(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 679, in transformer_hpu
return self.capture_replay(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 721, in capture_replay
outputs = self.transformer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1565, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/transformer_sd3.py", line 295, in forward
hidden_states = self.pos_embed(hidden_states) # takes care of adding positional embeddings too.
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1565, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/embeddings.py", line 208, in forward
latent = self.proj(latent)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1606, in _call_impl
result = forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py", line 677, in forward_measure
output = self.orig_mod(input)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1595, in _call_impl
args_result = hook(self, args)
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/torch_overwrites.py", line 68, in _pre_fwd_hook
new_name = module.custom_name
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1732, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'Conv2d' object has no attribute 'custom_name'
python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers --prompts "Sailing ship painting by Van Gogh" --num_images_per_prompt 4 --batch_size 1 --image_save_dir /tmp/stable_diffusion_3_images --scheduler default --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16
[INFO|pipeline_stable_diffusion_3.py:475] 2024-11-14 16:49:03,919 >> 1 prompt(s) received, 4 generation(s) per prompt, 1 sample(s) per batch, 4 total batch(es).
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:19<00:00, 19.78s/it]
[INFO|pipeline_stable_diffusion_3.py:626] 2024-11-14 16:50:23,043 >> Speed metrics: {'generation_runtime': 79.1241, 'generation_samples_per_second': 0.095, 'generation_steps_per_second': 19.008}
11/14/2024 16:50:23 - INFO - __main__ - Saving images in /tmp/stable_diffusion_3_images...
python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers --prompts "Sailing ship painting by Van Gogh" --num_images_per_prompt 3 --batch_size 1 --image_save_dir /tmp/stable_diffusion_3_images --scheduler default --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16
[INFO|pipeline_stable_diffusion_3.py:475] 2024-11-14 16:50:58,236 >> 1 prompt(s) received, 3 generation(s) per prompt, 1 sample(s) per batch, 3 total batch(es).
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:08<00:00, 22.81s/it]
[INFO|pipeline_stable_diffusion_3.py:626] 2024-11-14 16:52:06,668 >> Speed metrics: {'generation_runtime': 68.4306, 'generation_samples_per_second': 0.0, 'generation_steps_per_second': 2.676}
11/14/2024 16:52:06 - INFO - __main__ - Saving images in /tmp/stable_diffusion_3_images..
python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers --prompts "Sailing ship painting by Van Gogh" --num_images_per_prompt 2 --batch_size 1 --image_save_dir /tmp/stable_diffusion_3_images --scheduler default --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16
.
.
[INFO|pipeline_stable_diffusion_3.py:475] 2024-11-14 16:57:09,565 >> 1 prompt(s) received, 2 generation(s) per prompt, 1 sample(s) per batch, 2 total batch(es).
[WARNING|pipeline_stable_diffusion_3.py:480] 2024-11-14 16:57:09,565 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:55<00:00, 27.50s/it]
[INFO|pipeline_stable_diffusion_3.py:626] 2024-11-14 16:58:04,568 >> Speed metrics: {'generation_runtime': 55.0032, 'generation_samples_per_second': 0.046, 'generation_steps_per_second': 2.315}
11/14/2024 16:58:04 - INFO - __main__ - Saving images in /tmp/stable_diffusion_3_images...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary patch is included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is not needed @deepak-gowda-narayana
please exclude it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@deepak-gowda-narayana |
@imangohari1 This error is while quantization, will check on this. Did not notice the edited comment |
The SD3 1 prompt(s) received, 3 generation(s) per prompt, 1 sample(s) per batch, 3 total batch(es). @imangohari1 Below is the updated results 2024-11-15 20:53:04,295 >> 1 prompt(s) received, 3 generation(s) per prompt, 1 sample(s) per batch, 3 total batch(es). |
@deepak-gowda-narayana I ran the cmd on both 1.18 and 1.19 driver and it crashes. |
Yes I am seeing the same error on my side as well. Will work to resolve this |
Added support and enabled the follow for SD3 Pipeline.
What does this PR do?
Fixes # (issue)
Before submitting