Skip to content

Why Qwen-Image inference speed may slower than comfyui #12645

@hanggun

Description

@hanggun

Describe the bug

I have tested diffusers and Comfyui with the same parameters and check the input shapes
The parameters is

Image Image Image

I check the shapes of text embeds and vae, they are all the same. The attention is use the same pytorch attention. And I use the same bfloat16 version model.
But the speed is 2.39it/s in diffusers vs 2.7it /s in comfyui

I do a log of effort but can not find the place where influence the speed.

My diffusers version is 0.36.0.dev0

Reproduction

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import torch
from PIL import Image
from diffusers import QwenImagePipeline
pipeline = QwenImagePipeline.from_pretrained("Qwen/Qwen-Image", torch_dtype=torch.bfloat16, device_map='cuda')
prompt = """女孩"""
inputs = {
"prompt": prompt,
# "negative_prompt": " ",
# "generator": torch.manual_seed(42),
"generator": torch.Generator(device='cuda').manual_seed(1125488487853216),
"width": 1216,
'height': 832,
"true_cfg_scale": 1,
"num_inference_steps": 20,
"guidance_scale": 1.0,
"num_images_per_prompt": 1,
}
with torch.inference_mode():
output = pipeline(**inputs)
output_image = output.images[0]
output_image.save('output.png')

Logs

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • 🤗 Diffusers version: 0.36.0.dev0
  • Platform: Linux-5.15.0-160-generic-x86_64-with-glibc2.35
  • Running on Google Colab?: No
  • Python version: 3.12.3
  • PyTorch version (GPU?): 2.8.0+cu128 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.34.0
  • Transformers version: 4.57.1
  • Accelerate version: 1.11.0
  • PEFT version: 0.17.1
  • Bitsandbytes version: not installed
  • Safetensors version: 0.6.2
  • xFormers version: not installed
  • Accelerator: NVIDIA A800-SXM4-80GB, 81920 MiB
    NVIDIA A800-SXM4-80GB, 81920 MiB
    NVIDIA A800-SXM4-80GB, 81920 MiB
    NVIDIA A800-SXM4-80GB, 81920 MiB
    NVIDIA A800-SXM4-80GB, 81920 MiB
    NVIDIA A800-SXM4-80GB, 81920 MiB
    NVIDIA A800-SXM4-80GB, 81920 MiB
    NVIDIA A800-SXM4-80GB, 81920 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions