Skip to content

JyChen9811/STCDiT

Repository files navigation

(CVPR 2026) STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution

visitors

[Project Page]   [Paper]   [Supp]

Junyang Chen, Jiangxin Dong, Long Sun, Yixin Yang, Jinshan Pan
IMAG Lab, Nanjing University of Science and Technology

If STCDiT is helpful for you, please help star the GitHub Repo. Thanks!

Welcome to visit our website (专注底层视觉领域的信息服务平台) for low-level vision: https://lowlevelcv.com/


😊 You may also want to check our relevant works:

  1. FaithDiff (CVPR 2025) Paper | Code
    Unleashing diffusion priors with feature alignment and joint VAE–LDM optimization for faithful SR.

  2. CODSR (CVPR2026) Paper | Code
    A one-step diffusion SR framework enabling region-discriminative activation of generative priors and precise semantic grounding.

🚩 New Features/Updates

  • ✅ April 16, 2026. Release STCDiT testing code for NTIRE 2026 UGC VSR.
  • ✅ April 15, 2026. Release enhanced results of STCDiT on VideoLQ and SportsLQ.
  • ✅ April 15, 2026. Release SportsLQ. It includes 20 sports event videos at 720p resolution.
  • ✅ April 15, 2026. Release testing code and pre-trained model.
  • ✅ November 24, 2025. Create the repository.

To do

  • Release the training code. Note that STCDiT-tiny can be trained on 4×24 GB GPUs with the same training settings as in paper.
  • Release the Gradio Demo and ComfyUI Integration.
  • Release the testing code and pre-trained model. Note that STCDiT-tiny can be inferred on a single 24 GB GPU.

📷 Real-World Enhancement Results

VideoLQ_024_video.mp4
VideoLQ_031_video.mp4
013_LQ.mp4
Sports_001_video.mp4
Sports_011_video.mp4
Sports_010_video.mp4

🚀 How to evaluate

Environment

conda create -n STCDiT python=3.10.19 -y
pip install -r ./requirements_for_STCDiT.txt

conda create -n Qwen python=3.10.19 -y
pip install -r ./requirements_for_Qwen.txt

Note: If FlashAttention installation fails, download the .whl file and install it via pip.

Download Dependent Models

Val Dataset

  • SportsLQ: Modelscope
  • Enhanced results of STCDiT on VideoLQ and SportsLQ: Modelscope.
  • For download instructions, refer to download.sh.

Inference Script

Note: Please modify line 3 in ./Inference/test_STCDiT_large.py and ./Inference/test_STCDiT_tiny.py to your local directory path.

# Step 1: Generate Captions with Qwen2.5-VL
conda activate Qwen
bash ./Qwen2.5-VL/inference.sh

# Step 2: Run Video Super-Resolution with STCDiT
conda activate STCDiT

# STCDiT-Large with Wan2.1-I2V-14B base model, if you observe frequent texture flickering, set `cfg_scale=1`.
bash ./Inference/test_STCDiT_large.sh

# STCDiT-Tiny with Wan2.1-T2V-1.3B base model (a single 24 GB GPU is sufficient)
bash ./Inference/test_STCDiT_tiny.sh

BibTeX

@inproceedings{chen_STCDiT,
title={STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution},
author={Chen, Junyang and Jiangxin Dong and Long Sun and Yixin Yang and Pan, Jinshan},
booktitle={CVPR},
year={2026}
}

Contact

If you have any questions, please feel free to reach me out at jychen9811@gmail.com.


Acknowledgments

Our project is based on DiffSynth-Studio and Wan 2.1. Thanks for their awesome works.

About

[CVPR 2026] STCDiT for Real-World Video Enhancement and AIGC Enhancement. It achieves temporally stable and structurally faithful restoration even under complex motions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors