Revealing an Effective Recipe for Taming Robust
Video Super-Resolution Against Complex Degradations

ICCV 2025

DiffVSR unlocks the power of diffusion models to tackle severe degradation by shifting the focus from complex architectures to a more effective learning strategy.

📖 For more visual results, visit the project page.

🔥 Update

[2025.10] Inference code is released.
[2025.01] This repo is created.

🎬 Overview

🔧 Dependencies and Installation

Clone and set up the environment:

git clone https://github.com/xh9998/DiffVSR.git
cd DiffVSR

We provide a conda environment file. Create and activate it:

conda env create -f DiffVSR_env.yml
conda activate DiffVSR

Key packages (see DiffVSR_env.yml for the full list): PyTorch 2.0.0 (CUDA 11.7), diffusers 0.30.0, torchvision 0.15.0, einops, opencv, pandas, rotary-embedding-torch, xformers (optional), imageio.

📂 Pretrained Models

Please download the following three items from Hugging Face and place them under ./pretrained_models/:

TE-3DVAE.pt
DiffVSR_UNet.pt
upscaler4x/ folder

Download: DiffVSR on Hugging Face

Directory example:

DiffVSR/
 └─ pretrained_models/
     ├─ TE-3DVAE.pt
     ├─ DiffVSR_UNet.pt
     └─ upscaler4x/

☕️ Quick Inference

--input_path can be a single video, a frames folder, or a folder of videos.

python inference_tile.py \
  -i ./test_video/video1 \
  -o ./output \
  -txt /path/to/captions.csv

Arguments (main):

-i/--input_path: video file, frames folder, or folder of videos
-o/--output_path: directory for output mp4
-txt/--val_prompt: CSV containing video_name and caption
-p/--pretrained_model: UNet checkpoint path
-n/--noise_level: noise level (default 50)
-g/--guidance_scale: guidance scale (default 5)
-s/--inference_steps: denoising steps (default 50)
-oimg/--outputimage_path: dump generated PNG frames when provided
--use_ffmpeg: use ffmpeg for video encoding (typically smaller files than imageio.mimwrite, with slightly lower visual sharpness)
--tile_size / --tile_overlap: adjust tile size and overlap when VRAM is limited. Smaller tiles and overlaps lower peak memory at the cost of more tiling passes.

🧩 CSV Prompt Format

--val_prompt CSV should include one row per video with columns:

video_name: base name matching input video
caption: positive prompt text

The script concatenates an internal positive prompt string.

❤️ Acknowledgement

Our work builds upon the foundations laid by many excellent projects in the field. We would like to thank the authors of LaVie/vsr. We also drew inspiration for our inference strategy from Upscale-A-Video. We are grateful for their contributions to the community.

📑 Citation

If you find this repo useful, please consider citing (fill your bibtex):

@article{li2025diffvsr,
  title={DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations},
  author={Li, Xiaohui and Liu, Yihao and Cao, Shuo and Chen, Ziyan and Zhuang, Shaobin and Chen, Xiangyu and He, Yinan and Wang, Yi and Qiao, Yu},
  journal={arXiv preprint arXiv:2501.10110},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
configs		configs
models		models
.gitignore		.gitignore
DiffVSR_env.yml		DiffVSR_env.yml
README.md		README.md
inference.sh		inference.sh
inference_tile.py		inference_tile.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Revealing an Effective Recipe for Taming Robust
Video Super-Resolution Against Complex Degradations

ICCV 2025

🔥 Update

🎬 Overview

🔧 Dependencies and Installation

📂 Pretrained Models

☕️ Quick Inference

🧩 CSV Prompt Format

❤️ Acknowledgement

📑 Citation

About

Uh oh!

Releases

Packages

Languages

xh9998/DiffVSR

Folders and files

Latest commit

History

Repository files navigation

Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations

ICCV 2025

🔥 Update

🎬 Overview

🔧 Dependencies and Installation

📂 Pretrained Models

☕️ Quick Inference

🧩 CSV Prompt Format

❤️ Acknowledgement

📑 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Revealing an Effective Recipe for Taming Robust
Video Super-Resolution Against Complex Degradations

Packages