This code repository supports the training and evaluation of the AnyThermal model introduced in the paper "AnyThermal: Towards Learning Universal Representations for Thermal Perception," accepted at ICRA 2026. Please feel free to create issues if you need assistance with using the checkpoints or the codebase. Will try to resolve the issues as soon as possible.
- Setting up the environment
- Downloading and postprocessing of datasets
- (Optional) Downloading pre-trained checkpoints
- Training backbone (AnyThermal)
- Task training and eval
- VPR
- Segmentation
- Depth
docker pull parvmaheshwari/py310_cu123:latest
cd <PROJECT_ROOT>
pip install -r requirements.txt
requirements.txt targets Python 3.10 with PyTorch 2.1.2 / CUDA 12.1. The depth baseline needs
one extra package, pytorch3d, which has to be built against the installed PyTorch and is therefore
installed in a separate step — see Monocular Thermal Depth Estimation.
VPR (pretraining.vpr, benchmark.benchmark_vpr) imports the SALAD aggregator from a submodule. After cloning, pull at least the SALAD submodule:
git submodule update --init baselines/VPR/salad
The other submodules (STHN, ImageBind, MCNet, BridgeMultiSpectralDepth, fieldscale) are only needed if you run those specific baselines; init them on demand.
-
Change the dataset paths in the
/ocean/projects/cis220039p/pmaheshw/code/multi-modal/MultiLoc/custom_datasets/dataset_path.yamlfile to your local paths for root folders of each of the datasets. -
Follow the instructions in the respective dataset folders to download and postprocess the datasets.
Download anythermal_checkpoints.zip from HuggingFace. The zip's top-level directory is pretrained_checkpoints/, so extracting from the repo root would produce a nested pretrained_checkpoints/pretrained_checkpoints/ tree that the code paths in custom_models/str_to_cls.py don't expect. Extract to a scratch dir and move the contents in instead:
cd <PROJECT_ROOT>
unzip /path/to/anythermal_checkpoints.zip -d /tmp/anythermal_ckpt
mv /tmp/anythermal_ckpt/pretrained_checkpoints ./
rm -rf /tmp/anythermal_ckptThe result is a flat pretrained_checkpoints/{backbone,segmentation,vpr,depth}/... under the repo root.
cd <PROJECT_ROOT>
python3 -m pretraining.distill --dataset boson sthereo freiburg vivid tartanrgbt --loss_file loss_configs/loss_config_global_contrastive_final.yaml --wandb_name anythermal_backbone_distillaltion
This will save the checkpoint in $PROJECT_ROOT/checkpoints/<concatenated_sorted_string_of_dataset_names>
cd <PROJECT_ROOT>
python3 -m pretraining.vpr --backbone_path pretrained_checkpoints/backbone/AnyThermal_full/model20.pth
cd <PROJECT_ROOT>
# For Zero-shot MS2 (urban) Evaluation
bash benchmark/benchmark_bash/vpr/ms2.sh
# For Zero-shot CART (aerial) Evaluation
bash benchmark/benchmark_bash/vpr/cart.sh
#For Zero-shot OdomBeyondVision (indoor) Evalaution
bash benchmark/benchmark_bash/vpr/obv.sh
All the evaluation outputs are saved in benchmark/qualitative_ouptuts
cd <PROJECT_ROOT>
# For training segmentation on MF-Net
python3 -m pretraining.segmentation --dataset mfnet --backbone_ckpt pretrained_checkpoints/backbone/AnyThermal_full/model20.pth --augment
# For training segmentation on CART (random data split as provided by the dataset)
python3 -m pretraining.segmentation --dataset cart_random --backbone_ckpt pretrained_checkpoints/backbone/AnyThermal_full/model20.pth --augment --thermal_segmentation_augmentation brightness_contrast gamma hflip --epochs 125
cd <PROJECT_ROOT>
# For evaluating segmentation on MF-Net
bash benchmark/benchmark_bash/segmentation/mfnet.sh
# For evaluating segmentation on CART
bash benchmark/benchmark_bash/segmentation/cart.sh
All the evaluation outputs are saved in benchmark/qualitative_ouptuts
We provide a monocular thermal depth estimator that puts a MiDaS depth head on top of the frozen
AnyThermal DINOv2 backbone, trained and evaluated on the
MS2 dataset. The depth training /
evaluation code lives in the BridgeMultiSpectralDepth
submodule.
cd <PROJECT_ROOT>
git submodule update --init baselines/depth/BridgeMultiSpectralDepth
In addition to requirements.txt, the depth baseline needs pytorch3d. It compiles against the
installed PyTorch, so install it after the base dependencies, with build isolation disabled:
export CUDA_HOME=/usr/local/cuda-12.2 # a CUDA toolkit matching your torch (cu12.x)
export TORCH_CUDA_ARCH_LIST="8.6" # your GPU's compute capability (optional; speeds up the build)
pip install --no-build-isolation git+https://github.com/facebookresearch/pytorch3d.git
Download and extract the pretrained checkpoints (see Downloading pretrained checkpoints). Depth uses the frozen backbone and ships trained depth checkpoints:
pretrained_checkpoints/
├── backbone/AnyThermal_full/model20.pth # frozen backbone (used for training)
└── depth/
├── Midas_anythermal/ckpt_epoch=28_step=145000.ckpt # MiDaS + AnyThermal backbone
├── Midas_dinov2/ckpt_epoch=28_step=145000.ckpt # MiDaS + vanilla DINOv2 backbone
└── Midas_small/ckpt_epoch=26_step=135000.ckpt # MiDaS-small baseline
The AnyThermal depth config already points at this repo-root pretrained_checkpoints/ directory
(via a ../../../ relative path), so no copying or symlinking into the submodule is needed. Pass the
same ../../../pretrained_checkpoints/... prefix for --ckpt_path when evaluating (see below).
Request access and download MS2 from the
official page. For monocular
thermal depth you need the sync_data (thermal stereo images + calibration), proj_depth
(projected / filtered GT depth), and the official *_list.txt split files, arranged as:
<MS2_ROOT>/
├── sync_data/<seq>/thr/{img_left,img_right}/*.png (+ calib.npy per <seq>)
├── proj_depth/<seq>/thr/{depth,depth_filtered,intensity}/*.png
├── train_list.txt val_list.txt
└── test_day_list.txt test_night_list.txt test_rainy_list.txt
Point the config at your copy by editing dataset.MS2.dataset_dir in
baselines/depth/BridgeMultiSpectralDepth/configs/Base/Base_Sup_Mono_Depth.yaml.
cd baselines/depth/BridgeMultiSpectralDepth
CUDA_VISIBLE_DEVICES=0 python train.py \
--config configs/AnyThermal/Midas_anythermal.yaml \
--num_gpus 1 --exp_name Midas_anythermal
This freezes the AnyThermal backbone (pretrained_checkpoints/backbone/AnyThermal_full/model20.pth)
and trains the MiDaS depth head on MS2 thermal images. Checkpoints are written to
checkpoints/<exp_name>/.
Evaluate a checkpoint on each MS2 test condition (test_day, test_night, test_rain). You can
use the released Midas_anythermal checkpoint directly, without retraining:
cd baselines/depth/BridgeMultiSpectralDepth
for ENV in test_day test_night test_rain; do
CUDA_VISIBLE_DEVICES=0 python test_monodepth.py \
--config configs/AnyThermal/Midas_anythermal.yaml \
--ckpt_path ../../../pretrained_checkpoints/depth/Midas_anythermal/ckpt_epoch=28_step=145000.ckpt \
--modality thr --test_env ${ENV} \
--save_dir ./results/Midas_anythermal/${ENV}
done
This prints standard depth metrics (Abs Rel, Sq Rel, RMSE, δ accuracies) per condition and saves
depth visualizations under --save_dir.
If you found this repo to be helpful, please give us a star and consider citing our work
@misc{maheshwari2026anythermallearninguniversalrepresentations,
title={AnyThermal: Towards Learning Universal Representations for Thermal Perception},
author={Parv Maheshwari and Jay Karhade and Yogesh Chawla and Isaiah Adu and Florian Heisen and Andrew Porco and Andrew Jong and Yifei Liu and Santosh Pitla and Sebastian Scherer and Wenshan Wang},
year={2026},
eprint={2602.06203},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.06203},
}