Hi, great work on the project!
I noticed the current vision encoder is based on FLUX. The recently released Qwen-Image model seems to offer significantly better performance with a similar architecture.
Are there any plans to adapt the project to use Qwen-Image?
Thanks!