Skip to content

xh9998/DiffVSR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DiffVSR

Revealing an Effective Recipe for Taming Robust
Video Super-Resolution Against Complex Degradations

ICCV 2025

DiffVSR unlocks the power of diffusion models to tackle severe degradation by shifting the focus from complex architectures to a more effective learning strategy.

Teaser

📖 For more visual results, visit the project page.


🔥 Update

  • [2025.10] Inference code is released.
  • [2025.01] This repo is created.

🎬 Overview

overall_structure

🔧 Dependencies and Installation

Clone and set up the environment:

git clone https://github.com/xh9998/DiffVSR.git
cd DiffVSR

We provide a conda environment file. Create and activate it:

conda env create -f DiffVSR_env.yml
conda activate DiffVSR

Key packages (see DiffVSR_env.yml for the full list): PyTorch 2.0.0 (CUDA 11.7), diffusers 0.30.0, torchvision 0.15.0, einops, opencv, pandas, rotary-embedding-torch, xformers (optional), imageio.

📂 Pretrained Models

Please download the following three items from Hugging Face and place them under ./pretrained_models/:

  • TE-3DVAE.pt
  • DiffVSR_UNet.pt
  • upscaler4x/ folder

Download: DiffVSR on Hugging Face

Directory example:

DiffVSR/
 └─ pretrained_models/
     ├─ TE-3DVAE.pt
     ├─ DiffVSR_UNet.pt
     └─ upscaler4x/

☕️ Quick Inference

--input_path can be a single video, a frames folder, or a folder of videos.

python inference_tile.py \
  -i ./test_video/video1 \
  -o ./output \
  -txt /path/to/captions.csv

Arguments (main):

  • -i/--input_path: video file, frames folder, or folder of videos
  • -o/--output_path: directory for output mp4
  • -txt/--val_prompt: CSV containing video_name and caption
  • -p/--pretrained_model: UNet checkpoint path
  • -n/--noise_level: noise level (default 50)
  • -g/--guidance_scale: guidance scale (default 5)
  • -s/--inference_steps: denoising steps (default 50)
  • -oimg/--outputimage_path: dump generated PNG frames when provided
  • --use_ffmpeg: use ffmpeg for video encoding (typically smaller files than imageio.mimwrite, with slightly lower visual sharpness)
  • --tile_size / --tile_overlap: adjust tile size and overlap when VRAM is limited. Smaller tiles and overlaps lower peak memory at the cost of more tiling passes.

🧩 CSV Prompt Format

--val_prompt CSV should include one row per video with columns:

  • video_name: base name matching input video
  • caption: positive prompt text

The script concatenates an internal positive prompt string.

❤️ Acknowledgement

Our work builds upon the foundations laid by many excellent projects in the field. We would like to thank the authors of LaVie/vsr. We also drew inspiration for our inference strategy from Upscale-A-Video. We are grateful for their contributions to the community.

📑 Citation

If you find this repo useful, please consider citing (fill your bibtex):

@article{li2025diffvsr,
  title={DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations},
  author={Li, Xiaohui and Liu, Yihao and Cao, Shuo and Chen, Ziyan and Zhuang, Shaobin and Chen, Xiangyu and He, Yinan and Wang, Yi and Qiao, Yu},
  journal={arXiv preprint arXiv:2501.10110},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published