(CVPR 2026) STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution
[Project Page] [Paper] [Supp]
Junyang Chen, Jiangxin Dong, Long Sun, Yixin Yang, Jinshan Pan
IMAG Lab, Nanjing University of Science and Technology
If STCDiT is helpful for you, please help star the GitHub Repo. Thanks!
Welcome to visit our website (专注底层视觉领域的信息服务平台) for low-level vision: https://lowlevelcv.com/
😊 You may also want to check our relevant works:
-
FaithDiff (CVPR 2025) Paper | Code
Unleashing diffusion priors with feature alignment and joint VAE–LDM optimization for faithful SR. -
CODSR (CVPR2026) Paper | Code
A one-step diffusion SR framework enabling region-discriminative activation of generative priors and precise semantic grounding.
- ✅ April 16, 2026. Release STCDiT testing code for NTIRE 2026 UGC VSR.
- ✅ April 15, 2026. Release enhanced results of STCDiT on VideoLQ and SportsLQ.
- ✅ April 15, 2026. Release SportsLQ. It includes 20 sports event videos at 720p resolution.
- ✅ April 15, 2026. Release testing code and pre-trained model.
- ✅ November 24, 2025. Create the repository.
- Release the training code. Note that STCDiT-tiny can be trained on 4×24 GB GPUs with the same training settings as in paper.
- Release the Gradio Demo and ComfyUI Integration.
Release the testing code and pre-trained model. Note that STCDiT-tiny can be inferred on a single 24 GB GPU.
VideoLQ_024_video.mp4 |
VideoLQ_031_video.mp4 |
013_LQ.mp4 |
Sports_001_video.mp4 |
Sports_011_video.mp4 |
Sports_010_video.mp4 |
conda create -n STCDiT python=3.10.19 -y
pip install -r ./requirements_for_STCDiT.txt
conda create -n Qwen python=3.10.19 -y
pip install -r ./requirements_for_Qwen.txt
Note: If FlashAttention installation fails, download the .whl file and install it via pip.
- STCDiT and STCDiT-Tiny
- Wan2.1-i2v-14B
- Wan2.1-t2v-1.3B
- Qwen2.5-VL-7B-Instruct
- Put them in the
./model_checkpointsfolder. For download instructions, refer to download.sh.
- SportsLQ: Modelscope
- Enhanced results of STCDiT on VideoLQ and SportsLQ: Modelscope.
- For download instructions, refer to download.sh.
Note: Please modify line 3 in
./Inference/test_STCDiT_large.pyand./Inference/test_STCDiT_tiny.pyto your local directory path.
# Step 1: Generate Captions with Qwen2.5-VL
conda activate Qwen
bash ./Qwen2.5-VL/inference.sh
# Step 2: Run Video Super-Resolution with STCDiT
conda activate STCDiT
# STCDiT-Large with Wan2.1-I2V-14B base model, if you observe frequent texture flickering, set `cfg_scale=1`.
bash ./Inference/test_STCDiT_large.sh
# STCDiT-Tiny with Wan2.1-T2V-1.3B base model (a single 24 GB GPU is sufficient)
bash ./Inference/test_STCDiT_tiny.sh@inproceedings{chen_STCDiT,
title={STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution},
author={Chen, Junyang and Jiangxin Dong and Long Sun and Yixin Yang and Pan, Jinshan},
booktitle={CVPR},
year={2026}
}
If you have any questions, please feel free to reach me out at jychen9811@gmail.com.
Our project is based on DiffSynth-Studio and Wan 2.1. Thanks for their awesome works.