Support for Qwen-Image

Hi, great work on the project!

I noticed the current vision encoder is based on FLUX. The recently released Qwen-Image model seems to offer significantly better performance with a similar architecture.

Are there any plans to adapt the project to use Qwen-Image?

Thanks!