📣 News

UniWorld-Family

📣 News

[2025/11/25]:🤗 We release Uniworld-OSP2.0, a VLM-Enhanced Unified Framework for Image-to-Video Generation. The architecture scales FlashI2V to 14B parameters and introduces a novel conditioning mechanism based on a 7B VLM to losslessly inherit powerful semantic understanding. Uniworld-OSP2.0 surpasses the video generation model Wan2.1 across six key evaluation metrics on Vbench-I2V.
[2025/10/19]: We release UniWorld-V2, which employs DiffusionNFT and a training-free reward model derived from pretrained MLLMs to fine-tune diffusion models for image editing. UniWorld-Qwen-Image-Edit-2509 and UniWorld-FLUX.1-Kontext-Dev are open-sourced.
[2025.06.03] 🤗 We release UniWorld-V1, a unified framework for understanding, generation, and editing. All data, models, training code, and evaluation code are open-sourced. Checking our report for more details. Welcome to watch 👀 this repository for the latest updates.

💡 Hub

😍 Gallery

UniWorld-OSP2.0

Model	I2V Paradigm	Subject Consistency ↑	Background Consistency ↑	Motion Smoothness ↑	Dynamic Degree ↑	Aesthetic Quality ↑	Imaging Quality ↑	I2V Subject Consistency ↑	I2V Background Consistency ↑
SVD-XT-1.0 (1.5B)	Repeating Concat and Adding Noise	95.52	96.61	98.09	52.36	60.15	69.80	97.52	97.63
SVD-XT-1.1 (1.5B)	Repeating Concat and Adding Noise	95.42	96.77	98.12	43.17	60.23	70.23	97.51	97.62
SEINE-512x512 (1.8B)	Inpainting	95.28	97.12	97.12	27.07	64.55	71.39	97.15	96.94
CogVideoX-5B-I2V	Zero-padding Concat and Adding Noise	94.34	96.42	98.40	33.17	61.87	70.01	97.19	96.74
Wan2.1-I2V-14B-720P	Inpainting	94.86	97.07	97.90	51.38	64.75	70.44	96.95	96.44
CogVideoX1.5-5B-I2V	Zero-padding Concat and Adding Noise	95.04	96.52	98.47	37.48	62.68	70.99	97.78	98.73
Wan2.1-I2V-14B-480P	Inpainting	95.68	97.44	98.46	45.20	61.44	70.37	97.83	99.08
Uniworld-OSP2.0	FlashI2V	96.21	97.71	98.47	46.10	66.55	70.57	97.99	98.94

UniWorld-V2

Original	Prompt	Nano-banana	GPT-4o	Qwen-Image-Edit	UniWorld-V2 (Ours)
	Case 1: `把鸟移动到红框里，删除掉现在的鸟，最后移除红框`				（✅正确执行指令）
	Case 2: `把中间白色衣服戴口罩女生的手势改成OK`				（✅OK手势）
	Case 3: `提取画面中的吉他`				（✅弦钮上二下三）
	Case 4: `把下面的所有文字并改用书法体。中间的“月满中秋”改成“千里团圆”。并且把月亮改成模糊的月饼。`				（✅模糊月饼，✅书法字体）
	Case 5: `让画面中的形象坐在高档西餐厅，双手拿刀叉吃牛排`				（✅人物特征，✅刀叉）

UniWorld-V1

UniWorld-V1 shows excellent performance in 20+ tasks.

Click to play

🔒 License

See LICENSE for details. The FLUX weights fall under the FLUX.1 [dev] Non-Commercial License.

✏️ Citing

@article{li2025uniworldv2,
    title={Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback},
    author={Li, Zongjian and Liu, Zheyuan and Zhang, Qihui and Lin, Bin and Yuan, Shenghai and Yan, Zhiyuan and Ye, Yang and Yu, Wangbo and Niu, Yuwei and Yuan, Li},
    journal={arXiv preprint arXiv:2510.16888},
    year={2025}
}
@article{lin2025uniworld,
  title={UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation},
  author={Lin, Bin and Li, Zongjian and Cheng, Xinhua and Niu, Yuwei and Ye, Yang and He, Xianyi and Yuan, Shenghai and Yu, Wangbo and Wang, Shaodong and Ge, Yunyang and others},
  journal={arXiv preprint arXiv:2506.03147},
  year={2025}
}
@article{ye2025imgedit,
  title={ImgEdit: A Unified Image Editing Dataset and Benchmark},
  author={Ye, Yang and He, Xianyi and Li, Zongjian and Lin, Bin and Yuan, Shenghai and Yan, Zhiyuan and Hou, Bohan and Yuan, Li},
  journal={arXiv preprint arXiv:2505.20275},
  year={2025}
}
@article{niu2025wise,
  title={Wise: A world knowledge-informed semantic evaluation for text-to-image generation},
  author={Niu, Yuwei and Ning, Munan and Zheng, Mengren and Lin, Bin and Jin, Peng and Liao, Jiaqi and Ning, Kunpeng and Zhu, Bin and Yuan, Li},
  journal={arXiv preprint arXiv:2503.07265},
  year={2025}
}
@article{yan2025gpt,
  title={Gpt-imgeval: A comprehensive benchmark for diagnosing gpt4o in image generation},
  author={Yan, Zhiyuan and Ye, Junyan and Li, Weijia and Huang, Zilong and Yuan, Shenghai and He, Xiangyang and Lin, Kaiqing and He, Jun and He, Conghui and Yuan, Li},
  journal={arXiv preprint arXiv:2504.02782},
  year={2025}
}
@article{lin2024open,
  title={Open-Sora Plan: Open-Source Large Video Generation Model},
  author={Lin, Bin and Ge, Yunyang and Cheng, Xinhua and Li, Zongjian and Zhu, Bin and Wang, Shaodong and He, Xianyi and Ye, Yang and Yuan, Shenghai and Chen, Liuhan and others},
  journal={arXiv preprint arXiv:2412.00131},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
UniWorld-OSP2.0		UniWorld-OSP2.0
UniWorld-V1		UniWorld-V1
UniWorld-V2		UniWorld-V2
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UniWorld-Family

📣 News

💡 Hub

😍 Gallery

UniWorld-OSP2.0

UniWorld-V2

UniWorld-V1

🔒 License

✏️ Citing

🤝 Community contributors

About

Uh oh!

Releases 2

Packages

Contributors 6

Uh oh!

Languages

PKU-YuanGroup/UniWorld

Folders and files

Latest commit

History

Repository files navigation

UniWorld-Family

📣 News

💡 Hub

😍 Gallery

UniWorld-OSP2.0

UniWorld-V2

UniWorld-V1

🔒 License

✏️ Citing

🤝 Community contributors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 6

Uh oh!

Languages

Packages