-
[2025/11/25]:🤗 We release Uniworld-OSP2.0, a VLM-Enhanced Unified Framework for Image-to-Video Generation. The architecture scales FlashI2V to 14B parameters and introduces a novel conditioning mechanism based on a 7B VLM to losslessly inherit powerful semantic understanding. Uniworld-OSP2.0 surpasses the video generation model Wan2.1 across six key evaluation metrics on Vbench-I2V.
-
[2025/10/19]: We release UniWorld-V2, which employs DiffusionNFT and a training-free reward model derived from pretrained MLLMs to fine-tune diffusion models for image editing. UniWorld-Qwen-Image-Edit-2509 and UniWorld-FLUX.1-Kontext-Dev are open-sourced.
-
[2025.06.03] 🤗 We release UniWorld-V1, a unified framework for understanding, generation, and editing. All data, models, training code, and evaluation code are open-sourced. Checking our report for more details. Welcome to watch 👀 this repository for the latest updates.
| Model | I2V Paradigm | Subject Consistency ↑ | Background Consistency ↑ | Motion Smoothness ↑ | Dynamic Degree ↑ | Aesthetic Quality ↑ | Imaging Quality ↑ | I2V Subject Consistency ↑ | I2V Background Consistency ↑ |
|---|---|---|---|---|---|---|---|---|---|
| SVD-XT-1.0 (1.5B) | Repeating Concat and Adding Noise | 95.52 | 96.61 | 98.09 | 52.36 | 60.15 | 69.80 | 97.52 | 97.63 |
| SVD-XT-1.1 (1.5B) | Repeating Concat and Adding Noise | 95.42 | 96.77 | 98.12 | 43.17 | 60.23 | 70.23 | 97.51 | 97.62 |
| SEINE-512x512 (1.8B) | Inpainting | 95.28 | 97.12 | 97.12 | 27.07 | 64.55 | 71.39 | 97.15 | 96.94 |
| CogVideoX-5B-I2V | Zero-padding Concat and Adding Noise | 94.34 | 96.42 | 98.40 | 33.17 | 61.87 | 70.01 | 97.19 | 96.74 |
| Wan2.1-I2V-14B-720P | Inpainting | 94.86 | 97.07 | 97.90 | 51.38 | 64.75 | 70.44 | 96.95 | 96.44 |
| CogVideoX1.5-5B-I2V | Zero-padding Concat and Adding Noise | 95.04 | 96.52 | 98.47 | 37.48 | 62.68 | 70.99 | 97.78 | 98.73 |
| Wan2.1-I2V-14B-480P | Inpainting | 95.68 | 97.44 | 98.46 | 45.20 | 61.44 | 70.37 | 97.83 | 99.08 |
| Uniworld-OSP2.0 | FlashI2V | 96.21 | 97.71 | 98.47 | 46.10 | 66.55 | 70.57 | 97.99 | 98.94 |
UniWorld-V1 shows excellent performance in 20+ tasks.
Click to play
- See LICENSE for details. The FLUX weights fall under the FLUX.1 [dev] Non-Commercial License.
@article{li2025uniworldv2,
title={Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback},
author={Li, Zongjian and Liu, Zheyuan and Zhang, Qihui and Lin, Bin and Yuan, Shenghai and Yan, Zhiyuan and Ye, Yang and Yu, Wangbo and Niu, Yuwei and Yuan, Li},
journal={arXiv preprint arXiv:2510.16888},
year={2025}
}
@article{lin2025uniworld,
title={UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation},
author={Lin, Bin and Li, Zongjian and Cheng, Xinhua and Niu, Yuwei and Ye, Yang and He, Xianyi and Yuan, Shenghai and Yu, Wangbo and Wang, Shaodong and Ge, Yunyang and others},
journal={arXiv preprint arXiv:2506.03147},
year={2025}
}
@article{ye2025imgedit,
title={ImgEdit: A Unified Image Editing Dataset and Benchmark},
author={Ye, Yang and He, Xianyi and Li, Zongjian and Lin, Bin and Yuan, Shenghai and Yan, Zhiyuan and Hou, Bohan and Yuan, Li},
journal={arXiv preprint arXiv:2505.20275},
year={2025}
}
@article{niu2025wise,
title={Wise: A world knowledge-informed semantic evaluation for text-to-image generation},
author={Niu, Yuwei and Ning, Munan and Zheng, Mengren and Lin, Bin and Jin, Peng and Liao, Jiaqi and Ning, Kunpeng and Zhu, Bin and Yuan, Li},
journal={arXiv preprint arXiv:2503.07265},
year={2025}
}
@article{yan2025gpt,
title={Gpt-imgeval: A comprehensive benchmark for diagnosing gpt4o in image generation},
author={Yan, Zhiyuan and Ye, Junyan and Li, Weijia and Huang, Zilong and Yuan, Shenghai and He, Xiangyang and Lin, Kaiqing and He, Jun and He, Conghui and Yuan, Li},
journal={arXiv preprint arXiv:2504.02782},
year={2025}
}
@article{lin2024open,
title={Open-Sora Plan: Open-Source Large Video Generation Model},
author={Lin, Bin and Ge, Yunyang and Cheng, Xinhua and Li, Zongjian and Zhu, Bin and Wang, Shaodong and He, Xianyi and Ye, Yang and Yuan, Shenghai and Chen, Liuhan and others},
journal={arXiv preprint arXiv:2412.00131},
year={2024}
}


























