Skip to content

PKU-YuanGroup/OpenS2V-Nexus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

If you like our project, please give us a star ⭐ on GitHub for the latest update.

hf_space hf_paper arXiv Home Page Dataset Dataset Dataset Download License github

This repository is the official implementation of OpenS2V-Nexus, consisting of (i) OpenS2V‑Eval, a fine‑grained benchmark, and (ii) OpenS2V‑5M, a million‑scale dataset. Our goal is to establish the infrastructure for Subject-to-Video generation, thereby empowering the community.

πŸ’‘ We also have other video generation projects that may interest you ✨.

Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin, Yunyang Ge and Xinhua Cheng etc.
github github arXiv

ConsisID: Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Shenghai Yuan, Jinfa Huang and Xianyi He etc.
github github arXiv

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Shenghai Yuan, Jinfa Huang and Yujun Shi etc.
github github arXiv

ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan, Jinfa Huang and Yongqi Xu etc.
github github arXiv

πŸ“£ News

  • ⏳⏳⏳ Evaluating more models and updating the hf_space. PRs are welcome!
  • [2025.06.21] πŸƒβ€β™‚οΈ We add the evaluation results for MAGREF-480P; click here and here for details.
  • [2025.06.19] πŸ”₯ The preprocessed Cross-Frame Pairs is now available on Hugging Face, eliminating the need for online processing with this code during training. We also provide a demo dataloader here demonstrating how to use OpenS2V-5M during the training phase.
  • [2025.05.31] πŸƒβ€β™‚οΈ We add the evaluation results for Concat-ID-Wan-AdaLN; click here and here for details.
  • [2025.05.28] πŸƒβ€β™‚οΈ We add the evaluation results for Phantom-14B; click here and here for details.
  • [2025.05.27] πŸ”₯ Our arXiv paper on OpenS2V-Nexus is now available; click here for details.
  • [2025.05.26] πŸ”₯ All codes & datasets are out! We also release the testing prompts, reference images and videos generated by different models in OpenS2V-Eval, and you can click here to see more details.

✨ Highlights

  1. New S2V Benchmark.
    • We introduce OpenS2V-Eval for comprehensive evaluation of S2V models and propose three new automatic metrics aligned with human perception.
  2. New Insights for S2V Model Selection.
    • Our evaluations using OpenS2V-Eval provide crucial insights into the strengths and weaknesses of various subject-to-video generation models.
  3. Million-Scale S2V Dataset.
    • We create OpenS2V-5M, a dataset with 5.1M high-quality regular data and 0.35M Nexus Data, the latter is expected to address the three core challenges of subject-to-video.

Resources

  • OpenS2V-Eval: including 180 open-domain subject-text pairs, of which 80 are real and 100 are synthetic samples.
  • OpenS2V-5M: including 5M open-domain subject-text-video triples, which not only include Regular Data but also incorporate Nexus Data constructed using GPT-Image-1 and cross-video associations.
  • ConsisID-Bench: including 150 human-domain subject images and 90 text prompts, respectively.
  • ConsisID-Preview-Data: including 32K human-domain high-quality subject-text-video triples.

😍 Inhouse Model Gallery

This model (Ours‑) was trained on a subset of OpenS2V-5M, using about 0.3M high-quality data.

singlehuman_3.mp4
singlehuman_6.mp4
singlehuman_16.mp4
singlehuman.mp4
singleface_11.mp4
singleface_2.mp4
singleface_7.mp4
singleface_8.mp4
singleface_17.mp4

βš™οΈ Requirements and Installation

We recommend the requirements as follows.

Base Environment

# 0. Clone the repo
git clone --depth=1 https://github.com/PKU-YuanGroup/OpenS2V-Nexus.git
cd OpenS2V-Nexus

# 1. Create conda environment
conda create -n opens2v python=3.12.0
conda activate opens2v

# 3. Install PyTorch and other dependencies
# CUDA 11.8
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118
pip install flashinfer-python==0.2.2.post1 -i https://flashinfer.ai/whl/cu118/torch2.6
# CUDA 12.4
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install flashinfer-python==0.2.2.post1 -i https://flashinfer.ai/whl/cu124/torch2.6

# 4. Install main dependencies
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Base Checkpoints

cd OpenS2V-Nexus

huggingface-cli download --repo-type model \
BestWishYsh/OpenS2V-Weight \
--local-dir ckpts

Once ready, the weights will be organized in this format:

πŸ“¦ OpenS2V-Nexus/
β”œβ”€β”€ πŸ“‚ LaMa
β”œβ”€β”€ πŸ“‚ face_extractor
β”œβ”€β”€ πŸ“„ aesthetic-model.pth
β”œβ”€β”€ πŸ“„ glint360k_curricular_face_r101_backbone.bin
β”œβ”€β”€ πŸ“„ groundingdino_swint_ogc.pth
β”œβ”€β”€ πŸ“„ sam2.1_hiera_large.pt
β”œβ”€β”€ πŸ“„ yolo_world_v2_l_image_prompt_adapter-719a7afb.pth

πŸ—οΈ Benchmark

OpenS2V-Eval Results

We visualize the evaluation results of various Subject-to-Video generation models across Open-Domain, Human-Domain and Single-Object.

Get Videos Generated by Different S2V models

Dataset Download

To facilitate future research and to ensure full transparency, we release all the videos we sampled and used for OpenS2V-Eval evaluation. You can download them on Hugging Face. We also provide detailed explanations of the sampled videos and detailed setting for the models under evaluation here.

Leaderboard

See numeric values at our Leaderboard πŸ₯‡πŸ₯ˆπŸ₯‰

or you can run it locally:

cd leaderboard
python app.py

Evaluate Your Own Models

Please refer to this guide for how to evaluate customized models.

πŸ€— Dataset

Subject-Text-Video Triples in OpenS2V-5M

1_output_h264.mp4
2_output_h264.mp4
3_output_h264.mp4
5_output_h264.mp4
6_output_h264.mp4
7_output_h264.mp4

Get the Data

We release the subset of the OpenS2V-5M. The dataset is available at HuggingFace, or you can download it with the following command. Some samples can be found on our Project Page.

huggingface-cli download --repo-type dataset \
BestWishYsh/OpenS2V-5M \
--local-dir BestWishYsh/OpenS2V-5M

Usage of OpenS2V-5M

Please refer to this guide for how to use OpenS2V-5M dataset.

Process Your Own Videos

Please refer to this guide for how to process customized videos.

πŸ‘ Acknowledgement

πŸ”’ License

  • The majority of this project is released under the Apache 2.0 license as found in the LICENSE file.
  • The service is a research preview. Please contact us if you find any potential violations. ([email protected])

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation πŸ“.

@article{yuan2025opens2v,
  title={OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation},
  author={Yuan, Shenghai and He, Xianyi and Deng, Yufan and Ye, Yang and Huang, Jinfa and Lin, Bin and Luo, Jiebo and Yuan, Li},
  journal={arXiv preprint arXiv:2505.20292},
  year={2025}
}

🀝 Contributors