A3C Labyrinth Navigator

PyTorch implementation of Asynchronous Advantage Actor-Critic (A3C) for 3D maze navigation, based on Mnih et al. 2016.

Overview

This project trains an agent to navigate randomly generated 3D mazes using only visual input (84×84 RGB). The implementation follows the Labyrinth task from Section 5.4 of the paper:

Apples (red 3D objects): +1 reward when collected
Portal (purple ring): +10 reward, respawns agent, regenerates apples
Episode: Terminates after 60 seconds

The agent uses an LSTM to maintain memory across time steps, which is preserved across portal jumps but reset on episode termination.

Architecture

Layer	Output
Conv2D (8×8, stride 4)	16 channels
Conv2D (4×4, stride 2)	32 channels
Linear	256 units
LSTMCell	256 units
Actor head	action_space
Critic head	1

Total parameters: ~1.2M (~5 MB saved)

Installation

conda create -n chiefmazi python=3.10
conda activate chiefmazi
pip install torch torchvision gymnasium pyglet miniworld gradio pandas

Usage

Training with Dashboard

python app.py

Opens a Gradio dashboard at http://localhost:7860 showing:

Live agent view with visible apples and portal
Training statistics (frames, episodes, FPS)
Loss values (policy, value, entropy)
Action probability distribution
Score history plots

Headless Training

python train.py

Prints progress to console without UI overhead.

Evaluate Trained Model

import torch
from model import A3C_Labyrinth_Net

model = A3C_Labyrinth_Net(action_space=3)
model.load_state_dict(torch.load("checkpoints/a3c_final.pt"))
model.eval()

Project Structure

├── app.py            # Gradio training dashboard
├── train.py          # Headless training script
├── worker.py         # A3C worker process
├── model.py          # CNN-LSTM network
├── env_wrapper.py    # Labyrinth task wrapper
├── shared_optim.py   # Shared RMSprop optimizer
└── meshes/
    ├── apple.obj     # 3D apple model
    └── portal.obj    # 3D portal ring model

Environment

The LabyrinthWrapper extends MiniWorld's Maze environment with:

Object	Appearance	Reward	Effect
Apple	Red 3D mesh	+1	Disappears on collection
Portal	Purple ring	+10	Respawns agent, regenerates apples

Custom OBJ meshes are auto-copied to MiniWorld's mesh directory on first run.

Hyperparameters

Parameter	Value
Workers	16
Learning rate	1e-4
Discount (γ)	0.99
Entropy weight (β)	0.01
n-step	5
Gradient clip	40.0

Implementation Details

Visible collectibles: Apples and portal are rendered as 3D objects using custom OBJ meshes, giving the agent visual cues for navigation.
Shared memory: Global model and optimizer statistics are shared across processes using share_memory().
LSTM state management: Hidden state is detached at each update, reset on episode end, but preserved across portal jumps.
Asynchronous updates: Workers compute gradients locally and accumulate them to the global model (Hogwild-style).
macOS compatibility: Uses spawn multiprocessing for OpenGL compatibility.

Requirements

Python 3.10+
PyTorch 2.0+
Gymnasium
MiniWorld
Gradio

Leave am for Chief Mazi 🙈🔥😂

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A3C Labyrinth Navigator

Overview

Architecture

Installation

Usage

Training with Dashboard

Headless Training

Evaluate Trained Model

Project Structure

Environment

Hyperparameters

Implementation Details

Requirements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
meshes		meshes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
dashboard.png		dashboard.png
env_wrapper.py		env_wrapper.py
model.py		model.py
screengrab.gif		screengrab.gif
shared_optim.py		shared_optim.py
train.py		train.py
worker.py		worker.py

otilor/chiefmazi

Folders and files

Latest commit

History

Repository files navigation

A3C Labyrinth Navigator

Overview

Architecture

Installation

Usage

Training with Dashboard

Headless Training

Evaluate Trained Model

Project Structure

Environment

Hyperparameters

Implementation Details

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages