InsCMPR:Efficient Cross-Modal Place Recognition via Instance-Aware Hybrid Mamba-Transformer

This repository contains the implementation of our ICRA paper: InsCMPR

We propose a novel instance-aware modality alignment strategy for the CMPR task. By leveraging a pre-trained vision foundation model, our method aligns multi-modal data at both the pixel and instance levels, effectively mitigating domain shifts and generating superior global descriptors.
We introduce a novel dual-branch hybrid Mamba-Transformer network for the CMPR task, capable of efficiently processing multi-modal data aligned at different levels in parallel, enhancing the robustness and accuracy of CMPR.
Extensive experimental results on KITTI, NCLT and HAOMO datasets show that our proposed method can achieve state-of-the-art performance while running in real-time at about 30Hz.

Citation

If you use our code, please star our repo and cite our paper.

@inproceedings{jiao2025icra,
	title={{InsCMPR: Efficient Cross-Modal Place Recognition via Instance-Aware Hybrid Mamba-Transformer}},
	author={Shuaifeng Jiao, Zhuoqun Su, Lun Luo, Hongshan Yu, Zongtan Zhou, Huimin Lu, Xieyuanli Chen},
	booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
	year={2025}
}

Dependencies

This code is tested on Ubuntu 20.04 with Python 3.7 with PyTorch 1.13.1 and CUDA 11.8 with following packages.

Main packages:

pytorch,torchvision, numpy, scipy, matplotlib, opencv-python

Misc packages: pillow h5py scikit-image scikit-learn faiss_gpu matplotlib tqdm causal-conv1d mamba-ssm timm tensorboardX einops transformers

You can directly install above or using the provided requirements.txt file using pip install -r requirements.txt. However, for PyTorch and torchvision, please install them according to specific CUDA version directly from https://pytorch.org. These lines have been commented out in requirements.txt. After installation of the above packages, activate the environment, then run the commands in "Code" section below.

If you encounter compilation errors when installing mamba-ssm and causal-conv1d, you can try the following methods to install them.

pip install causal_conv1d-1.0.2+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl
pip install mamba_ssm-1.0.1+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

The .whl files of causal_conv1d and mamba_ssm could be downloaded here. {causal-conv1d,mamba-ssm}

How to Use

Datasets preparation

KITTI dataset

setp1 LiDAR to Image Project

Build

cd ./tools/PointInterpolation
mkdir build
cd build
cmake ..
make

Run

./pointInterKitti

setp2 Depth Completion

python depth_completion_kitti.py

setp3 Generating Instance Masks

cd tools/MobileSAM/MobileSAMv2
bash ./experiments/mobilesamv2.sh

Please use the recommended KITTI data structure as follows:

data
    ├── poses
    │   ├── 00.txt
    │   ├── 01.txt
    │   └── ...
    └── sequences
        ├── 00
        │   ├──depth
        │   ├──image
        │   ├──depthSAM
        │   ├──imageSAM
        │   ├──velodyne
        │   ├──depth.txt
        │   └──image.txt
        │
        ├── 01
        │   ├──depth
        │   ├──image
        │   ├──depthSAM
        │   ├──imageSAM
        │   ├──velodyne
        │   ├──depth.txt
        │   └──image.txt
        |  
        |── 02
        └── ...

NCLT dataset

NCLT datset contains data from a Velodyne32-HDL LiDAR, a Ladybug3 camera, and an RTK-GPS. The Ladybug3 camera system captures omnidirectional images from five viewpoints. ts. We process each viewpoint individually using modality alignment before stitching the five images together.

setp1 Image Undistortion

python nclt_cam_undistored.py

setp2 Image Cropping

python nclt_image_crop.py

setp3 Image Stitching

python nclt_cat.py

setp4 LiDAR to Image Project and Depth Completion

python nclt_depth_crop.py

setp5 Depth Stitching

python nclt_cat.py

setp6 Generating Instance Masks for RGB and Depth Images

cd tools/MobileSAM/MobileSAMv2
bash ./experiments/mobilesamv2.sh

data
    ├── ground_truth
    │   ├── groundtruth_2012-01-08.csv
    │   ├── groundtruth_2012-01-15.csv
    │   └── ...
    ├── 2012-01-08
    │   ├── image
    │   │    ├── Cam
    |   |    ├── Cam_SAM
    |   |    └── Cam.txt
    │   └──  lidar
    │        ├── depth
    |        ├── depth_SAM
    |        └── depth.txt
    │   
    ├── 2012-02-05
    │   ├── image
    │   │    ├── Cam
    |   |    ├── Cam_SAM
    |   |    └── Cam.txt
    │   └──  lidar
    │        ├── depth
    |        ├── depth_SAM
    |        └── depth.txt
    └── ...

Training

You can start the training process with

python -m torch.distributed.launch --nproc_per_node=4 --use_env train_kitti.py

or

python -m torch.distributed.launch --nproc_per_node=4 --use_env  train_nclt.py

Evaluation

You can start the evaluating process with

python ./evaluation/evaluate_kitti.py

or

python ./evaluation/evaluate_nclt.py

License

This project is free software made available under the MIT License. For details see the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
configs		configs
evaluation		evaluation
models		models
tools		tools
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
requirements.txt		requirements.txt
train_haomo.py		train_haomo.py
train_kitti.py		train_kitti.py
train_nclt.py		train_nclt.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InsCMPR:Efficient Cross-Modal Place Recognition via Instance-Aware Hybrid Mamba-Transformer

Citation

Table of Contents

Dependencies

How to Use

Datasets preparation

KITTI dataset

NCLT dataset

Training

Evaluation

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

nubot-nudt/InsCMPR

Folders and files

Latest commit

History

Repository files navigation

InsCMPR:Efficient Cross-Modal Place Recognition via Instance-Aware Hybrid Mamba-Transformer

Citation

Table of Contents

Dependencies

How to Use

Datasets preparation

KITTI dataset

NCLT dataset

Training

Evaluation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages