Y-MAP-Net

We present Y-MAP-Net, a Y-shaped neural network architecture designed for real-time multi-task learning on RGB images. Y-MAP-Net simultaneously predicts depth, surface normals, human pose, semantic segmentation, and generates multi-label captions in a single forward pass. To achieve this, we adopt a multi-teacher, single-student training paradigm, where task-specific foundation models supervise the learning of the network, allowing it to distill their capabilities into a unified real-time inference architecture. Y-MAP-Net exhibits strong generalization, architectural simplicity, and computational efficiency, making it well-suited for resource-constrained robotic platforms. By providing rich 3D, semantic, and contextual scene understanding from low-cost RGB cameras, Y-MAP-Net supports key robotic capabilities such as object manipulation and human–robot interaction.

One click deployment in Google Collab :

Features

Real-time inference from webcam, video files, image folders or screen capture
Multi-task outputs: 2D pose (17 COCO joints), depth, surface normals, segmentation, and text token embeddings
Multiple backends: TensorFlow, TFLite, JAX and ONNX
Interactive web UI via Gradio

Youtube Supplementary Video of Y-MAP-Net

Quick Start

1. Setup

# Automated setup (creates a virtual environment and installs dependencies)
scripts/setup.sh
source venv/bin/activate

Or using Docker:

docker/build_and_deploy.sh
docker run ymapnet-container
docker attach ymapnet-container
cd workspace
scripts/setup.sh
source venv/bin/activate

2. Download a Pre-trained Model

scripts/downloadPretrained.sh

3. Run

./runYMAPNet.sh

This starts real-time pose estimation from your default webcam.

To perform vehicle counting (supplementary example)

wget http://ammar.gr/datasets/car.mp4
./runYMAPNet.sh --from car.mp4 --fast --monitor Vehicle 100 128 right --monitor Vehicle 190 128 left

The "left" and "right" windows will contain the detection results graph

Running Inference

Input Sources

# Webcam (default)
./runYMAPNet.sh

# Video file
./runYMAPNet.sh --from /path/to/video.mp4

# Image directory / sequence
./runYMAPNet.sh --from /path/to/images/

# Screen capture
./runYMAPNet.sh --from screen

# Specific video device
./runYMAPNet.sh --from /dev/video0

Common Options

Flag	Description
`--size W H`	Set input resolution (e.g. `--size 640 480`)
`--cpu`	Force CPU-only inference (slower)
`--fast`	Disable depth refinement, person ID, and skeleton resolution for speed
`--save`	Save output frames to disk
`--headless`	Run without any display window
`--illustrate`	Enable enhanced visualization overlay
`--collab`	Headless mode with save + illustrate (useful for Colab/remote)
`--profiling`	Enable performance profiling

Web Interface

python3 gradioServer.py
# Open http://localhost:7860 in your browser

Prerequisites

Python 3.x
TensorFlow 2.16.1+ (with CUDA 12.3+ and cuDNN 8.9.6+ for GPU support)
Keras 3+
NumPy, OpenCV
See requirements.txt for the full list

Install all dependencies:

pip install -r requirements.txt
# or run the setup script:
python3 scripts/setup.sh

Pre-trained lite model downloads (development snapshot)

Format	Model	Size	Download	Engine
Keras (ICRA26)	Full	2.1GB	GDrive Link2	--engine tf
Keras (dev)	Full	1.8GB	Link	--engine tf
TFLite FP32	Lite	~268 MB	Link	--engine tflite
TFLite FP16	Lite	~210 MB	Link	--engine tflite
ONNX FP32	Lite	~268 MB	Link	--engine onnx
ONNX FP16	Lite	~209 MB	Link	--engine onnx
JAX (npz)	Lite	~268 MB	Link	--engine jax

To use a different engine you need to invoke it in the following way :

./runYMAPNet.sh --engine onnx

COCO17 Evaluation

To evaluate the model against COCO17 follow the following commands from the root directory of the project :

wget http://ammar.gr/ymapnet/archive/ymapnet_coco_validation_dataset.zip
unzip ymapnet_coco_validation_dataset.zip
python3 evaluateYMAPNet.py

Citation

If you find our work useful or use it in your projects please cite :

@inproceedings{qammaz2026ymapnet,
  author = {Qammaz, Ammar and Vasilikopoulos, Nikos and Oikonomidis, Iason and Argyros, Antonis A},
  title = {Y-MAP-Net: Learning from Foundation Models for Real-Time, Multi-Task Scene Perception},
  booktitle = {IEEE International Conference on Robotics and Automation (ICRA 2026), (to appear)},
  year = {2026},
  month = {June},
  projects =  {MAGICIAN}
}

License

FORTH License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
datasets/DataLoader		datasets/DataLoader
doc		doc
docker		docker
scripts		scripts
NNConverter.py		NNConverter.py
NNExecutor.py		NNExecutor.py
NNLosses.py		NNLosses.py
NNModel.py		NNModel.py
NNOptimize.py		NNOptimize.py
NNTraining.py		NNTraining.py
NNTransplant.py		NNTransplant.py
README.md		README.md
TokenEstimator.py		TokenEstimator.py
YMAPNet.py		YMAPNet.py
calculateNormalsFromDepthmap.py		calculateNormalsFromDepthmap.py
createJSONConfiguration.py		createJSONConfiguration.py
espStream.py		espStream.py
folderStream.py		folderStream.py
gradioClient.py		gradioClient.py
gradioServer.py		gradioServer.py
gradioShared.py		gradioShared.py
illustrate.py		illustrate.py
imageProcessing.py		imageProcessing.py
license.txt		license.txt
requirements.txt		requirements.txt
requirementsFull.txt		requirementsFull.txt
resolveJointHierarchy.py		resolveJointHierarchy.py
runYMAPNet.py		runYMAPNet.py
runYMAPNet.sh		runYMAPNet.sh
screenStream.py		screenStream.py
tools.py		tools.py
trainTokensOnly.py		trainTokensOnly.py
trainYMAPNet.py		trainYMAPNet.py
visualizeTokenConfusion.py		visualizeTokenConfusion.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Y-MAP-Net

Features

Quick Start

1. Setup

2. Download a Pre-trained Model

3. Run

Running Inference

Input Sources

Common Options

Web Interface

Prerequisites

Pre-trained lite model downloads (development snapshot)

COCO17 Evaluation

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Y-MAP-Net

Features

Quick Start

1. Setup

2. Download a Pre-trained Model

3. Run

Running Inference

Input Sources

Common Options

Web Interface

Prerequisites

Pre-trained lite model downloads (development snapshot)

COCO17 Evaluation

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages