-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
Describe the bug
I have tested diffusers and Comfyui with the same parameters and check the input shapes
The parameters is
I check the shapes of text embeds and vae, they are all the same. The attention is use the same pytorch attention. And I use the same bfloat16 version model.
But the speed is 2.39it/s in diffusers vs 2.7it /s in comfyui
I do a log of effort but can not find the place where influence the speed.
My diffusers version is 0.36.0.dev0
Reproduction
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import torch
from PIL import Image
from diffusers import QwenImagePipeline
pipeline = QwenImagePipeline.from_pretrained("Qwen/Qwen-Image", torch_dtype=torch.bfloat16, device_map='cuda')
prompt = """女孩"""
inputs = {
"prompt": prompt,
# "negative_prompt": " ",
# "generator": torch.manual_seed(42),
"generator": torch.Generator(device='cuda').manual_seed(1125488487853216),
"width": 1216,
'height': 832,
"true_cfg_scale": 1,
"num_inference_steps": 20,
"guidance_scale": 1.0,
"num_images_per_prompt": 1,
}
with torch.inference_mode():
output = pipeline(**inputs)
output_image = output.images[0]
output_image.save('output.png')
Logs
System Info
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
- 🤗 Diffusers version: 0.36.0.dev0
- Platform: Linux-5.15.0-160-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.12.3
- PyTorch version (GPU?): 2.8.0+cu128 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.34.0
- Transformers version: 4.57.1
- Accelerate version: 1.11.0
- PEFT version: 0.17.1
- Bitsandbytes version: not installed
- Safetensors version: 0.6.2
- xFormers version: not installed
- Accelerator: NVIDIA A800-SXM4-80GB, 81920 MiB
NVIDIA A800-SXM4-80GB, 81920 MiB
NVIDIA A800-SXM4-80GB, 81920 MiB
NVIDIA A800-SXM4-80GB, 81920 MiB
NVIDIA A800-SXM4-80GB, 81920 MiB
NVIDIA A800-SXM4-80GB, 81920 MiB
NVIDIA A800-SXM4-80GB, 81920 MiB
NVIDIA A800-SXM4-80GB, 81920 MiB - Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
No response