REPA-G - Official implementation of "Test-Time Conditioning with Representation-Aligned Visual Features"
Nicolas Sereyjol-Garros1 · Ellington Kirby1 · Victor Letzelter1,2 · Victor Besnier1 Nermin Samet1 ·
1 Valeo.ai, Paris, France 2 LTCI, Télécom Paris, Institut Polytechnique de Paris, France
While representation alignment with selfsupervised models has been shown to improve diffusion model training, its potential for enhancing inference-time conditioning remains largely unexplored. We introduce Representation-Aligned Guidance (REPA-G), a framework that leverages these aligned representations, with rich semantic properties, to enable test-time conditioning from features in generation. By optimizing a similarity objective (the potential) at inference, we steer the denoising process toward a conditioned representation extracted from a pre-trained feature extractor. Our method provides versatile control at multiple scales, ranging from fine-grained texture matching via single patches to broad semantic guidance using global image feature tokens. We further extend this to multi-concept composition, allowing for the faithful combination of distinct concepts. REPA-G operates entirely at inference time, offering a flexible and precise alternative to often ambiguous text prompts or coarse class labels. We theoretically justify how this guidance enables sampling from the potential-induced tilted distribution. Quantitative results on ImageNet and COCO demonstrate that our approach achieves high-quality, diverse generations.
If you find our work useful, please consider citing:
@misc{sereyjol2026repag,
title={Test-Time Conditioning with Representation-Aligned Visual Features},
author={Nicolas Sereyjol-Garros and Ellington Kirby and Victor Letzelter and Victor Besnier and Nermin Samet},
year={2026},
eprint={2602.03753},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.03753},
}To install depedencies, please run:
pip install -r requirements.txtTo Download together REPA-E, REPA and SiT without alignmnet, run the script
bash scripts/download/download_sit.shRun the demo with
streamlit run app/home.pyA toy example is also provided in toy_example/toy_example.ipynb
For evaluation of alignment with anchors with additional image backbone, download the image backbones needed and put them in ckpts
mocov3: this link and place it as./ckpts/mocov3_vitb.pthJEPA: this link and place it as./ckpts/ijepa_vith.pthMAE: this link and place it as./ckpts/mae_vitl.pth
or run the script
bash scripts/download/download_image_backbone.shDownload and extract the training split of the ImageNet-1K dataset. Once it's ready, run the following command to preprocess the dataset:
python preprocessing.py --imagenet-path /PATH/TO/IMAGENET_TRAINReplace /PATH/TO/IMAGENET_TRAIN with the actual path to the extracted training images.
Download reference file for ImageNet with
bash scripts/download/download_ref_in.shExample scripts for generation and evaluation (average feature conditioning) are provided in scripts/eval. Change --data-dir argument with the correct path to ImageNet and run for example,
bash scripts/eval/eval_imagenet_repae.shThis codebase is largely built upon:
We sincerely thank the authors for making their work publicly available.

