GitHub - PKU-YuanGroup/OpenS2V-Nexus: OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

If you like our project, please give us a star ⭐ on GitHub for the latest update.

This repository is the official implementation of OpenS2V-Nexus, consisting of (i) OpenS2V‑Eval, a fine‑grained benchmark, and (ii) OpenS2V‑5M, a million‑scale dataset. Our goal is to establish the infrastructure for Subject-to-Video generation, thereby empowering the community.

💡 We also have other video generation projects that may interest you ✨.

Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin, Yunyang Ge and Xinhua Cheng etc.

ConsisID: Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Shenghai Yuan, Jinfa Huang and Xianyi He etc.

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Shenghai Yuan, Jinfa Huang and Yujun Shi etc.

ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan, Jinfa Huang and Yongqi Xu etc.

📣 News

⏳⏳⏳ Evaluating more models and updating the . PRs are welcome!
[2025.06.21] 🏃‍♂️ We add the evaluation results for MAGREF-480P; click here and here for details.
[2025.06.19] 🔥 The preprocessed Cross-Frame Pairs is now available on Hugging Face, eliminating the need for online processing with this code during training. We also provide a demo dataloader here demonstrating how to use OpenS2V-5M during the training phase.
[2025.05.31] 🏃‍♂️ We add the evaluation results for Concat-ID-Wan-AdaLN; click here and here for details.
[2025.05.28] 🏃‍♂️ We add the evaluation results for Phantom-14B; click here and here for details.
[2025.05.27] 🔥 Our arXiv paper on OpenS2V-Nexus is now available; click here for details.
[2025.05.26] 🔥 All codes & datasets are out! We also release the testing prompts, reference images and videos generated by different models in OpenS2V-Eval, and you can click here to see more details.

✨ Highlights

New S2V Benchmark.
- We introduce OpenS2V-Eval for comprehensive evaluation of S2V models and propose three new automatic metrics aligned with human perception.
New Insights for S2V Model Selection.
- Our evaluations using OpenS2V-Eval provide crucial insights into the strengths and weaknesses of various subject-to-video generation models.
Million-Scale S2V Dataset.
- We create OpenS2V-5M, a dataset with 5.1M high-quality regular data and 0.35M Nexus Data, the latter is expected to address the three core challenges of subject-to-video.

Resources

OpenS2V-Eval: including 180 open-domain subject-text pairs, of which 80 are real and 100 are synthetic samples.
OpenS2V-5M: including 5M open-domain subject-text-video triples, which not only include Regular Data but also incorporate Nexus Data constructed using GPT-Image-1 and cross-video associations.
ConsisID-Bench: including 150 human-domain subject images and 90 text prompts, respectively.
ConsisID-Preview-Data: including 32K human-domain high-quality subject-text-video triples.

😍 Inhouse Model Gallery

This model (Ours‡) was trained on a subset of OpenS2V-5M, using about 0.3M high-quality data.

singlehuman_3.mp4	singlehuman_6.mp4	singlehuman_16.mp4
singlehuman.mp4	singleface_11.mp4	singleface_2.mp4
singleface_7.mp4	singleface_8.mp4	singleface_17.mp4

⚙️ Requirements and Installation

We recommend the requirements as follows.

Base Environment

# 0. Clone the repo
git clone --depth=1 https://github.com/PKU-YuanGroup/OpenS2V-Nexus.git
cd OpenS2V-Nexus

# 1. Create conda environment
conda create -n opens2v python=3.12.0
conda activate opens2v

# 3. Install PyTorch and other dependencies
# CUDA 11.8
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118
pip install flashinfer-python==0.2.2.post1 -i https://flashinfer.ai/whl/cu118/torch2.6
# CUDA 12.4
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install flashinfer-python==0.2.2.post1 -i https://flashinfer.ai/whl/cu124/torch2.6

# 4. Install main dependencies
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Base Checkpoints

cd OpenS2V-Nexus

huggingface-cli download --repo-type model \
BestWishYsh/OpenS2V-Weight \
--local-dir ckpts

Once ready, the weights will be organized in this format:

📦 OpenS2V-Nexus/
├── 📂 LaMa
├── 📂 face_extractor
├── 📄 aesthetic-model.pth
├── 📄 glint360k_curricular_face_r101_backbone.bin
├── 📄 groundingdino_swint_ogc.pth
├── 📄 sam2.1_hiera_large.pt
├── 📄 yolo_world_v2_l_image_prompt_adapter-719a7afb.pth

🗝️ Benchmark

OpenS2V-Eval Results

We visualize the evaluation results of various Subject-to-Video generation models across Open-Domain, Human-Domain and Single-Object.

Get Videos Generated by Different S2V models

To facilitate future research and to ensure full transparency, we release all the videos we sampled and used for OpenS2V-Eval evaluation. You can download them on Hugging Face. We also provide detailed explanations of the sampled videos and detailed setting for the models under evaluation here.

Leaderboard

See numeric values at our Leaderboard 🥇🥈🥉

or you can run it locally:

cd leaderboard
python app.py

Evaluate Your Own Models

Please refer to this guide for how to evaluate customized models.

🤗 Dataset

Subject-Text-Video Triples in OpenS2V-5M

1_output_h264.mp4	2_output_h264.mp4	3_output_h264.mp4
5_output_h264.mp4	6_output_h264.mp4	7_output_h264.mp4

Get the Data

We release the subset of the OpenS2V-5M. The dataset is available at HuggingFace, or you can download it with the following command. Some samples can be found on our Project Page.

huggingface-cli download --repo-type dataset \
BestWishYsh/OpenS2V-5M \
--local-dir BestWishYsh/OpenS2V-5M

Usage of OpenS2V-5M

Please refer to this guide for how to use OpenS2V-5M dataset.

Process Your Own Videos

Please refer to this guide for how to process customized videos.

👍 Acknowledgement

This project wouldn't be possible without the following open-sourced repositories: Open-Sora Plan, Video-Dataset-Scripts, YOLO-World, Grounded-SAM-2, improved-aesthetic-predictor, Qwen2.5-VL, vllm, VBench, ChronoMagic-Bench, Phantom, VACE, SkyReels-A2, HunyuanCustom, ConsisID, Concat-ID, Fantasy-ID, EchoVideo.

🔒 License

The majority of this project is released under the Apache 2.0 license as found in the LICENSE file.
The service is a research preview. Please contact us if you find any potential violations. ([email protected])

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.

@article{yuan2025opens2v,
  title={OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation},
  author={Yuan, Shenghai and He, Xianyi and Deng, Yufan and Ye, Yang and Huang, Jinfa and Lin, Bin and Luo, Jiebo and Yuan, Li},
  journal={arXiv preprint arXiv:2505.20292},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
__assets__		__assets__
data_process		data_process
eval		eval
leaderboard		leaderboard
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 News

✨ Highlights

Resources

😍 Inhouse Model Gallery

⚙️ Requirements and Installation

Base Environment

Base Checkpoints

🗝️ Benchmark

OpenS2V-Eval Results

Get Videos Generated by Different S2V models

Leaderboard

Evaluate Your Own Models

🤗 Dataset

Subject-Text-Video Triples in OpenS2V-5M

Get the Data

Usage of OpenS2V-5M

Process Your Own Videos

👍 Acknowledgement

🔒 License

✏️ Citation

🤝 Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

PKU-YuanGroup/OpenS2V-Nexus

Folders and files

Latest commit

History

Repository files navigation

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 News

✨ Highlights

Resources

😍 Inhouse Model Gallery

⚙️ Requirements and Installation

Base Environment

Base Checkpoints

🗝️ Benchmark

OpenS2V-Eval Results

Get Videos Generated by Different S2V models

Leaderboard

Evaluate Your Own Models

🤗 Dataset

Subject-Text-Video Triples in OpenS2V-5M

Get the Data

Usage of OpenS2V-5M

Process Your Own Videos

👍 Acknowledgement

🔒 License

✏️ Citation

🤝 Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages