π‘ We also have other video generation projects that may interest you β¨.
Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin, Yunyang Ge and Xinhua Cheng etc.
![]()
![]()
![]()
ConsisID: Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Shenghai Yuan, Jinfa Huang and Xianyi He etc.
![]()
![]()
![]()
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Shenghai Yuan, Jinfa Huang and Yujun Shi etc.
![]()
![]()
![]()
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan, Jinfa Huang and Yongqi Xu etc.
![]()
![]()
![]()
- β³β³β³ Evaluating more models and updating the
. PRs are welcome!
[2025.06.21]
πββοΈ We add the evaluation results for MAGREF-480P; click here and here for details.[2025.06.19]
π₯ The preprocessed Cross-Frame Pairs is now available on Hugging Face, eliminating the need for online processing with this code during training. We also provide a demo dataloader here demonstrating how to use OpenS2V-5M during the training phase.[2025.05.31]
πββοΈ We add the evaluation results for Concat-ID-Wan-AdaLN; click here and here for details.[2025.05.28]
πββοΈ We add the evaluation results for Phantom-14B; click here and here for details.[2025.05.27]
π₯ Our arXiv paper on OpenS2V-Nexus is now available; click here for details.[2025.05.26]
π₯ All codes & datasets are out! We also release the testing prompts, reference images and videos generated by different models in OpenS2V-Eval, and you can click here to see more details.
- New S2V Benchmark.
- We introduce OpenS2V-Eval for comprehensive evaluation of S2V models and propose three new automatic metrics aligned with human perception.
- New Insights for S2V Model Selection.
- Our evaluations using OpenS2V-Eval provide crucial insights into the strengths and weaknesses of various subject-to-video generation models.
- Million-Scale S2V Dataset.
- We create OpenS2V-5M, a dataset with 5.1M high-quality regular data and 0.35M Nexus Data, the latter is expected to address the three core challenges of subject-to-video.
- OpenS2V-Eval: including 180 open-domain subject-text pairs, of which 80 are real and 100 are synthetic samples.
- OpenS2V-5M: including 5M open-domain subject-text-video triples, which not only include Regular Data but also incorporate Nexus Data constructed using GPT-Image-1 and cross-video associations.
- ConsisID-Bench: including 150 human-domain subject images and 90 text prompts, respectively.
- ConsisID-Preview-Data: including 32K human-domain high-quality subject-text-video triples.
This model (Oursβ‘) was trained on a subset of OpenS2V-5M, using about 0.3M high-quality data.
singlehuman_3.mp4 |
singlehuman_6.mp4 |
singlehuman_16.mp4 |
singlehuman.mp4 |
singleface_11.mp4 |
singleface_2.mp4 |
singleface_7.mp4 |
singleface_8.mp4 | singleface_17.mp4 |
We recommend the requirements as follows.
# 0. Clone the repo
git clone --depth=1 https://github.com/PKU-YuanGroup/OpenS2V-Nexus.git
cd OpenS2V-Nexus
# 1. Create conda environment
conda create -n opens2v python=3.12.0
conda activate opens2v
# 3. Install PyTorch and other dependencies
# CUDA 11.8
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118
pip install flashinfer-python==0.2.2.post1 -i https://flashinfer.ai/whl/cu118/torch2.6
# CUDA 12.4
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install flashinfer-python==0.2.2.post1 -i https://flashinfer.ai/whl/cu124/torch2.6
# 4. Install main dependencies
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
cd OpenS2V-Nexus
huggingface-cli download --repo-type model \
BestWishYsh/OpenS2V-Weight \
--local-dir ckpts
Once ready, the weights will be organized in this format:
π¦ OpenS2V-Nexus/
βββ π LaMa
βββ π face_extractor
βββ π aesthetic-model.pth
βββ π glint360k_curricular_face_r101_backbone.bin
βββ π groundingdino_swint_ogc.pth
βββ π sam2.1_hiera_large.pt
βββ π yolo_world_v2_l_image_prompt_adapter-719a7afb.pth
We visualize the evaluation results of various Subject-to-Video generation models across Open-Domain, Human-Domain and Single-Object.
To facilitate future research and to ensure full transparency, we release all the videos we sampled and used for OpenS2V-Eval evaluation. You can download them on Hugging Face. We also provide detailed explanations of the sampled videos and detailed setting for the models under evaluation here.
See numeric values at our Leaderboard π₯π₯π₯
or you can run it locally:
cd leaderboard
python app.py
Please refer to this guide for how to evaluate customized models.
1_output_h264.mp4 |
2_output_h264.mp4 |
3_output_h264.mp4 |
5_output_h264.mp4 |
6_output_h264.mp4 |
7_output_h264.mp4 |
We release the subset of the OpenS2V-5M. The dataset is available at HuggingFace, or you can download it with the following command. Some samples can be found on our Project Page.
huggingface-cli download --repo-type dataset \
BestWishYsh/OpenS2V-5M \
--local-dir BestWishYsh/OpenS2V-5M
Please refer to this guide for how to use OpenS2V-5M dataset.
Please refer to this guide for how to process customized videos.
- This project wouldn't be possible without the following open-sourced repositories: Open-Sora Plan, Video-Dataset-Scripts, YOLO-World, Grounded-SAM-2, improved-aesthetic-predictor, Qwen2.5-VL, vllm, VBench, ChronoMagic-Bench, Phantom, VACE, SkyReels-A2, HunyuanCustom, ConsisID, Concat-ID, Fantasy-ID, EchoVideo.
- The majority of this project is released under the Apache 2.0 license as found in the LICENSE file.
- The service is a research preview. Please contact us if you find any potential violations. ([email protected])
If you find our paper and code useful in your research, please consider giving a star β and citation π.
@article{yuan2025opens2v,
title={OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation},
author={Yuan, Shenghai and He, Xianyi and Deng, Yufan and Ye, Yang and Huang, Jinfa and Lin, Bin and Luo, Jiebo and Yuan, Li},
journal={arXiv preprint arXiv:2505.20292},
year={2025}
}