Skip to content

valeoai/REPA-G

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

REPA-G - Official implementation of "Test-Time Conditioning with Representation-Aligned Visual Features"

Nicolas Sereyjol-Garros1·Ellington Kirby1·Victor Letzelter1,2·Victor Besnier1Nermin Samet1·

1 Valeo.ai, Paris, France   2 LTCI, Télécom Paris, Institut Polytechnique de Paris, France  

📃 Paper

Overview

While representation alignment with selfsupervised models has been shown to improve diffusion model training, its potential for enhancing inference-time conditioning remains largely unexplored. We introduce Representation-Aligned Guidance (REPA-G), a framework that leverages these aligned representations, with rich semantic properties, to enable test-time conditioning from features in generation. By optimizing a similarity objective (the potential) at inference, we steer the denoising process toward a conditioned representation extracted from a pre-trained feature extractor. Our method provides versatile control at multiple scales, ranging from fine-grained texture matching via single patches to broad semantic guidance using global image feature tokens. We further extend this to multi-concept composition, allowing for the faithful combination of distinct concepts. REPA-G operates entirely at inference time, offering a flexible and precise alternative to often ambiguous text prompts or coarse class labels. We theoretically justify how this guidance enables sampling from the potential-induced tilted distribution. Quantitative results on ImageNet and COCO demonstrate that our approach achieves high-quality, diverse generations.

📚 Citation

If you find our work useful, please consider citing:

@misc{sereyjol2026repag,
      title={Test-Time Conditioning with Representation-Aligned Visual Features}, 
      author={Nicolas Sereyjol-Garros and Ellington Kirby and Victor Letzelter and Victor Besnier and Nermin Samet},
      year={2026},
      eprint={2602.03753},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.03753}, 
}

Getting Started

1. Environment Setup

To install depedencies, please run:

pip install -r requirements.txt

2. Download the pretrained model

To Download together REPA-E, REPA and SiT without alignmnet, run the script

bash scripts/download/download_sit.sh

3. Demo

Run the demo with

streamlit run app/home.py

A toy example is also provided in toy_example/toy_example.ipynb

(Optional) Download additional visual backbone for evaluation

For evaluation of alignment with anchors with additional image backbone, download the image backbones needed and put them in ckpts

  • mocov3 : this link and place it as ./ckpts/mocov3_vitb.pth
  • JEPA : this link and place it as ./ckpts/ijepa_vith.pth
  • MAE : this link and place it as ./ckpts/mae_vitl.pth

or run the script

bash scripts/download/download_image_backbone.sh

4. Prepare ImageNet

Download and extract the training split of the ImageNet-1K dataset. Once it's ready, run the following command to preprocess the dataset:

python preprocessing.py --imagenet-path /PATH/TO/IMAGENET_TRAIN

Replace /PATH/TO/IMAGENET_TRAIN with the actual path to the extracted training images.

5. Evaluate

Download reference file for ImageNet with

bash scripts/download/download_ref_in.sh

Example scripts for generation and evaluation (average feature conditioning) are provided in scripts/eval. Change --data-dir argument with the correct path to ImageNet and run for example,

bash scripts/eval/eval_imagenet_repae.sh

Acknowledgement

This codebase is largely built upon:

We sincerely thank the authors for making their work publicly available.

About

Official Repositary of "Test-Time Conditioning with Representation-Aligned Visual Features"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors