This project investigates training and evaluating multiple object detection models for detecting vehicles, pedestrians, and cyclists in urban environments.
We use the TensorFlow Object Detection API for model training and AWS SageMaker for deployment.
The dataset is real-world and imbalanced, making small and rare object detection particularly challenging.
Three pretrained architectures were fine-tuned for 2,000 steps each:
| Model | Backbone | Architecture | Feature Pyramid |
|---|---|---|---|
| EfficientDet-D1 | EfficientNet | One-stage | BiFPN |
| SSD MobileNet V2 FPNLite | MobileNet V2 | One-stage SSD | Lightweight FPN |
| SSD ResNet50 V1 FPN | ResNet-50 | One-stage SSD | Standard FPN |
| Metric | EfficientDet-D1 | SSD MobileNet V2 FPNLite | SSD ResNet50 V1 FPN |
|---|---|---|---|
| mAP@[0.5:0.95] | 0.0865 | 0.1015 | 0.0459 |
| mAP@.50 | 0.2212 | 0.2137 | 0.1030 |
| mAP@.75 | 0.0522 | 0.0881 | 0.0373 |
| mAP (Small) | 0.0381 | 0.0387 | 0.0134 |
| mAP (Medium) | 0.3172 | 0.3630 | 0.1595 |
| mAP (Large) | 0.3939 | 0.3389 | 0.2141 |
-
Best Overall:
SSD MobileNet V2 FPNLiteachieved the highest mAP@[0.5:0.95] and excelled in detecting small and medium objects, which are critical in urban scenes. -
Best for Large Objects:
EfficientDet-D1excelled at large object detection and provided balanced, stable training. -
Least Effective Under Constraints:
SSD ResNet50 V1 FPNunderperformed in most metrics, likely due to its deeper architecture requiring more training steps.
Across all models, validation loss remained higher than training loss — a common sign of class imbalance in datasets (e.g., more cars than cyclists).
- SSD MobileNet V2 FPNLite: Fastest training speed (4.49 steps/sec), low classification loss, but slightly higher localization loss.
- EfficientDet-D1: Most stable loss trends and balanced learning.
- SSD ResNet50 V1 FPN: High initial losses with gradual improvement, but plateaued early.
The best-performing model (SSD MobileNet V2 FPNLite) was deployed using:
- AWS SageMaker for inference hosting.
2_deploy_model.ipynbfor endpoint creation and video generation.- Output videos include bounding boxes for:
- 🚗 Vehicles
- 🚶 Pedestrians
- 🚴 Cyclists
![]() EfficientDet-D1 |
![]() SSD MobileNet V2 FPNLite |
![]() SSD ResNet50 V1 FPN |
- LiDAR Bird’s-Eye-View Detection Project
Covers LiDAR range image processing, point-cloud visualization, BEV map creation, model-based object detection, and performance evaluation.
Implements:- Range image → point-cloud conversion & visualization
- BEV map intensity & height layers
- Complex YOLO & second-model integration
- 3D bounding box extraction
- Precision/recall evaluation via IoU analysis
This project implements a multi-object tracking pipeline combining Lidar and Camera measurements, using an Extended Kalman Filter (EKF) for sensor fusion.
The work follows the given rubric, covering object tracking, track management, data association, sensor fusion, and performance evaluation.
| Rubric Section | Key Implementations | Status |
|---|---|---|
| Tracking | EKF with constant velocity motion model (filter.py), tuned F and Q matrices, RMSE ≤ 0.35 for lidar-only scenario |
✅ |
| Track Management | Automatic track initialization, scoring, state management (tentative, confirmed), deletion of stale tracks |
✅ |
| Data Association | Nearest neighbor with association matrix, chi-square gating (association.py) |
✅ |
| Sensor Fusion | Camera measurement model h(x), Jacobian H, visibility check, fusion in tracking loop (measurements.py) |
✅ |
| Evaluation | RMSE computed for at least 3 confirmed tracks, 2 tracks from 0s–200s without loss, mean RMSE < 0.25 | ✅ |
| Lidar-Only EKF Tracking | Track Management |
|---|---|
| Multi-Target Data Association | Sensor Fusion (Lidar + Camera) |
|---|---|
- RMSE (Mean): < 0.25 for two long-duration tracks (0s–200s)
- Confirmed Tracks: ≥ 3
- Track Loss: None for main sequences
- Precision & Recall: High due to accurate gating and association
This project implements 3D scan matching for vehicle localization in simulation, using:
- Iterative Closest Point (ICP)
- Normal Distributions Transform (NDT)
The goal is to accurately localize a moving car in the simulator using only lidar data after initialization.
| Rubric Section | Key Implementations | Status |
|---|---|---|
| Localization | Continuous localization within ≤ 1.2m error over ≥ 170m drive; works at medium speed (~3 taps on up arrow) | ✅ |
| 3D Scan Matching | ICP and NDT implemented using lidar data only after initial pose; ground truth only used for initialization | ✅ |
| ICP Localization | NDT Localization |
|---|---|
![]() |
![]() |
Below is a demonstration of the decision making framework in action:
This project implements and evaluates PID controllers for vehicle steering and throttle control using three methods:
- Standard PID
- Twiddle optimization for parameter tuning
- Adaptive Control (including returned adaptive control)
The controllers are tested in simulation, and performance is analyzed via multiple plots.
- All code is written in C++ and runs without errors.
- Key files:
main.cpp,pid_controller.cpp, andpid_controller.h. run_main_pid.shcompiles and runs the code.- The PID controller class includes proportional, integral, and derivative terms with proper update functions.
| Controller Type | Steering Plot | Throttle Plot |
|---|---|---|
| Standard PID | ||
| Twiddle Optimized PID | ||
| Adaptive Control | ||
| Returned Adaptive Ctrl |





