DiffVSR unlocks the power of diffusion models to tackle severe degradation by shifting the focus from complex architectures to a more effective learning strategy.
📖 For more visual results, visit the project page.
- [2025.10] Inference code is released.
- [2025.01] This repo is created.
Clone and set up the environment:
git clone https://github.com/xh9998/DiffVSR.git
cd DiffVSRWe provide a conda environment file. Create and activate it:
conda env create -f DiffVSR_env.yml
conda activate DiffVSRKey packages (see DiffVSR_env.yml for the full list): PyTorch 2.0.0 (CUDA 11.7), diffusers 0.30.0, torchvision 0.15.0, einops, opencv, pandas, rotary-embedding-torch, xformers (optional), imageio.
Please download the following three items from Hugging Face and place them under ./pretrained_models/:
TE-3DVAE.ptDiffVSR_UNet.ptupscaler4x/folder
Download: DiffVSR on Hugging Face
Directory example:
DiffVSR/
└─ pretrained_models/
├─ TE-3DVAE.pt
├─ DiffVSR_UNet.pt
└─ upscaler4x/
--input_path can be a single video, a frames folder, or a folder of videos.
python inference_tile.py \
-i ./test_video/video1 \
-o ./output \
-txt /path/to/captions.csvArguments (main):
-i/--input_path: video file, frames folder, or folder of videos-o/--output_path: directory for output mp4-txt/--val_prompt: CSV containingvideo_nameandcaption-p/--pretrained_model: UNet checkpoint path-n/--noise_level: noise level (default 50)-g/--guidance_scale: guidance scale (default 5)-s/--inference_steps: denoising steps (default 50)-oimg/--outputimage_path: dump generated PNG frames when provided--use_ffmpeg: use ffmpeg for video encoding (typically smaller files than imageio.mimwrite, with slightly lower visual sharpness)--tile_size/--tile_overlap: adjust tile size and overlap when VRAM is limited. Smaller tiles and overlaps lower peak memory at the cost of more tiling passes.
--val_prompt CSV should include one row per video with columns:
video_name: base name matching input videocaption: positive prompt text
The script concatenates an internal positive prompt string.
Our work builds upon the foundations laid by many excellent projects in the field. We would like to thank the authors of LaVie/vsr. We also drew inspiration for our inference strategy from Upscale-A-Video. We are grateful for their contributions to the community.
If you find this repo useful, please consider citing (fill your bibtex):
@article{li2025diffvsr,
title={DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations},
author={Li, Xiaohui and Liu, Yihao and Cao, Shuo and Chen, Ziyan and Zhuang, Shaobin and Chen, Xiangyu and He, Yinan and Wang, Yi and Qiao, Yu},
journal={arXiv preprint arXiv:2501.10110},
year={2025}
}

