GitHub - JyChen9811/STCDiT: [CVPR 2026] STCDiT for Real-World Video Enhancement and AIGC Enhancement. It achieves temporally stable and structurally faithful restoration even under complex motions.

(CVPR 2026) STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution

[Project Page] [Paper] [Supp]

Junyang Chen, Jiangxin Dong, Long Sun, Yixin Yang, Jinshan Pan
IMAG Lab, Nanjing University of Science and Technology

If STCDiT is helpful for you, please help star the GitHub Repo. Thanks!

Welcome to visit our website (专注底层视觉领域的信息服务平台) for low-level vision: https://lowlevelcv.com/

😊 You may also want to check our relevant works:

FaithDiff (CVPR 2025) Paper | Code
Unleashing diffusion priors with feature alignment and joint VAE–LDM optimization for faithful SR.
CODSR (CVPR2026) Paper | Code
A one-step diffusion SR framework enabling region-discriminative activation of generative priors and precise semantic grounding.

🚩 New Features/Updates

✅ April 16, 2026. Release STCDiT testing code for NTIRE 2026 UGC VSR.
✅ April 15, 2026. Release enhanced results of STCDiT on VideoLQ and SportsLQ.
✅ April 15, 2026. Release SportsLQ. It includes 20 sports event videos at 720p resolution.
✅ April 15, 2026. Release testing code and pre-trained model.
✅ November 24, 2025. Create the repository.

⚡ To do

Release the training code. Note that STCDiT-tiny can be trained on 4×24 GB GPUs with the same training settings as in paper.
Release the Gradio Demo and ComfyUI Integration.
~~Release the testing code and pre-trained model. Note that STCDiT-tiny can be inferred on a single 24 GB GPU.~~

📷 Real-World Enhancement Results

VideoLQ_024_video.mp4	VideoLQ_031_video.mp4
013_LQ.mp4	Sports_001_video.mp4
Sports_011_video.mp4	Sports_010_video.mp4

🚀 How to evaluate

Environment

conda create -n STCDiT python=3.10.19 -y
pip install -r ./requirements_for_STCDiT.txt

conda create -n Qwen python=3.10.19 -y
pip install -r ./requirements_for_Qwen.txt

Note: If FlashAttention installation fails, download the .whl file and install it via pip.

Download Dependent Models

STCDiT and STCDiT-Tiny
Wan2.1-i2v-14B
Wan2.1-t2v-1.3B
Qwen2.5-VL-7B-Instruct
Put them in the ./model_checkpoints folder. For download instructions, refer to download.sh.

Val Dataset

SportsLQ: Modelscope
Enhanced results of STCDiT on VideoLQ and SportsLQ: Modelscope.
For download instructions, refer to download.sh.

Inference Script

Note: Please modify line 3 in ./Inference/test_STCDiT_large.py and ./Inference/test_STCDiT_tiny.py to your local directory path.

# Step 1: Generate Captions with Qwen2.5-VL
conda activate Qwen
bash ./Qwen2.5-VL/inference.sh

# Step 2: Run Video Super-Resolution with STCDiT
conda activate STCDiT

# STCDiT-Large with Wan2.1-I2V-14B base model, if you observe frequent texture flickering, set `cfg_scale=1`.
bash ./Inference/test_STCDiT_large.sh

# STCDiT-Tiny with Wan2.1-T2V-1.3B base model (a single 24 GB GPU is sufficient)
bash ./Inference/test_STCDiT_tiny.sh

BibTeX

@inproceedings{chen_STCDiT,
title={STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution},
author={Chen, Junyang and Jiangxin Dong and Long Sun and Yixin Yang and Pan, Jinshan},
booktitle={CVPR},
year={2026}
}

Contact

If you have any questions, please feel free to reach me out at jychen9811@gmail.com.

Acknowledgments

Our project is based on DiffSynth-Studio and Wan 2.1. Thanks for their awesome works.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Inference		Inference
NTIRE2026_UGC_VSR		NTIRE2026_UGC_VSR
Qwen2.5-VL		Qwen2.5-VL
Video_example		Video_example
diffsynth		diffsynth
paper		paper
README.md		README.md
download.sh		download.sh
requirements_for_Qwen.txt		requirements_for_Qwen.txt
requirements_for_STCDiT.txt		requirements_for_STCDiT.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(CVPR 2026) STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution

🚩 New Features/Updates

⚡ To do

📷 Real-World Enhancement Results

🚀 How to evaluate

Environment

Download Dependent Models

Val Dataset

Inference Script

BibTeX

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

(CVPR 2026) STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution

🚩 New Features/Updates

⚡ To do

📷 Real-World Enhancement Results

🚀 How to evaluate

Environment

Download Dependent Models

Val Dataset

Inference Script

BibTeX

Contact

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages