Official source code repository for:
SERF: Spatiotemporal Environment and Robot Feature Map for Long-Horizon Mobile Manipulation
This repository provides the mapping code for SERF, covering:
Note: this repository does not include the VLA component of SERF.
Run these commands from the repository root.
git submodule update --init --recursive
mamba create -n serf-mapping python=3.10 -y
conda activate serf-mapping
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txtDownload the pre-generated datasets from suk063/SERF. The full dataset can be large, so we recommend downloading only the files you need.
For example, download the robot dataset and one task-0021 training episode
into data/:
hf download suk063/SERF \
--repo-type dataset \
--include "mapping_dataset/robot/robot_data.hdf5" \
--include "mapping_dataset/task-0021/train/episode_00212800.hdf5" \
--local-dir dataTo generate the dataset yourself, see DATASET_GENERATION.md.
Download DINOv3 weights from facebookresearch/dinov3:
--backbone_weights:dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth--dinotxt_weights:dinov3_vitl16_dinotxt_vision_head_and_text_encoder-a442d8f5.pth
Save the weights locally and pass their paths to the command below.
Add DINOv3 features to every HDF5 file under data/mapping_dataset:
python dataset/generate_dino_embedding.py \
--input_path data/mapping_dataset \
--backbone_weights /path/to/dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth \
--dinotxt_weights /path/to/dinov3_vitl16_dinotxt_vision_head_and_text_encoder-a442d8f5.pthAfter extracting DINO features, train the environment and robot feature maps:
python mapping/train_neural_points_env_and_robot.py \
--config mapping/config/config_env_and_robot.yamlThe command above jointly trains the environment and robot maps. To train them
separately, use mapping/train_neural_points_env.py or
mapping/train_neural_points_robot.py. Hyperparameters are in mapping/config/.
Pre-trained map models are available under map_models/ in
suk063/SERF. For
task-0021, download the model to data/map_models/task-0021:
hf download suk063/SERF \
--repo-type dataset \
--include "map_models/task-0021/**" \
--local-dir dataWe provide one uncompressed expert demonstration per task, generated by replaying BEHAVIOR-1K demonstrations.
For example, download the replayed task-0021 demonstration:
hf download suk063/SERF \
--repo-type dataset \
--include "demonstration_replay/task-0021/episode_00212800.hdf5" \
--local-dir dataSee DATASET_GENERATION.md to replay demonstrations yourself.
Use the replayed expert demonstration to update the environment map:
python tracking/neural_point_tracking.py --task 0021This offline tracker calls CoTracker at each step, making it slower than the online SERF VLA tracker but useful for offline tracking quality. For faster inference, refer to the SERF VLA tracking implementation.
Tracked environment maps are available under
exported_neural_points/.
For example, download the task-0021 tracked map:
hf download suk063/SERF \
--repo-type dataset \
--include "exported_neural_points/task-0021/**" \
--local-dir dataTo include the robot map overlay, download the matching BEHAVIOR-1K demonstration parquet file:
hf download behavior-1k/2025-challenge-demos \
--repo-type dataset \
--include "data/task-0021/episode_00212800.parquet" \
--local-dir data/behavior-1k/2025-challenge-demosThen visualize the feature map with the robot overlay:
python visualization/visualize_feature_map.py \
--tracking_hdf5 data/exported_neural_points/task-0021/train/episode_00212800.hdf5 \
--include_robot_modelControls: T/R jump +100/-100 steps; F/D step +1/-1 frame.
If you find this repository useful, please consider citing our paper:
@article{kim2026serf,
title = {SERF: Spatiotemporal Environment and Robot Feature Map for Long-Horizon Mobile Manipulation},
author = {Kim, Sunghwan and Pak, Byeonghyun and Long, Kehan and Tian, Yulun and Atanasov, Nikolay},
journal = {arXiv preprint arXiv:2606.12956},
year = {2026}
}This project is released under the license provided in LICENSE.
We thank the BEHAVIOR-1K, DINOv3, CoTracker3, and SAM2 teams for making their resources, models, and code publicly available.