Drew Kalasky ark2224

Hi, I’m Drew 👋

Machine Learning Engineer | High-Performance Systems | C++/CUDA | Computer Vision | ML Infrastructure

I build real-time, performance-critical systems at the intersection of computer vision, GPU acceleration, robotics simulation, and ML infrastructure.
My background spans AI/ML, optimization, deep learning, GPU programming, and systems-level engineering, with a Master’s degree in AI & Machine Learning from Columbia University.

I'm currently focused on:

Modern C++ (C++17/20) for vision, tracking, and real-time pipelines
CUDA kernel optimization and GPU acceleration
Inference runtime systems (batching, streaming, KV-cache simulation)
Orchestrated ML pipelines (Airflow/Prefect, MLflow, Ray)
Distributed GPU workloads on Kubernetes
Simulation and autonomous robotics

Tech Stack

Systems & Performance

C++17/20, Python
Linux, CMake, gdb, perf
Multithreading, lock-free queues
Real-time computer vision (OpenCV, tracking)

GPU & HPC

CUDA, Nsight Systems / Nsight Compute
TensorRT, ONNX Runtime
Triton Inference Server
GPU memory optimization, fused kernels

ML & Inference Systems

PyTorch, TensorFlow
FastAPI / gRPC
Continuous batching, streaming inference
Quantization (bitsandbytes), ONNX export
Model deployment & benchmarking

MLOps & Cloud

MLflow, Airflow / Prefect
Docker, Kubernetes, Ray
AWS / GCP
Feature engineering & data validation

Selected Projects

Real-Time Multi-Camera Tracking Pipeline (C++17, OpenCV, Linux)

A high-performance, multi-threaded pipeline for real-time object tracking across multiple video streams.
Demonstrates C++ systems design, threading, frame pipelines, low-latency processing, and modular CV architecture.

Tech: C++17, OpenCV, pthreads, lock-free queues, CMake, Linux

CUDA Fused Vision Kernel (CUDA, Nsight)

Hand-optimized fused CUDA kernel implementing RGB→Gray, Gaussian blur, and Sobel edge detection in a single pass.
Includes CPU vs GPU benchmarks and profiling analysis.

Tech: CUDA, Nsight Systems/Compute, memory coalescing, shared memory

LLM Inference Server (Python, FastAPI, GPU)

Custom inference engine with simplified continuous batching, token streaming, and latency profiling.
Builds intuition for LLM runtime engines (vLLM, SGLang, TensorRT-LLM).

Tech: FastAPI, PyTorch, queuing, GPU inference, streaming APIs

MLOps Training Pipeline (Airflow/Prefect, MLflow, Docker)

End-to-end ML workflow with orchestration, metrics tracking, model registry, CI/CD, and Dockerized deployment.

Tech: Airflow/Prefect, MLflow, S3, Docker, GitHub Actions

GPU Job Scheduler (Kubernetes, Triton, Prometheus/Grafana)

Lightweight GPU workload scheduler distributing inference jobs across a GPU-enabled Kubernetes cluster.
Includes autoscaling, monitoring, and queue-based scheduling.

Tech: Kubernetes, Triton, Prometheus, Grafana, Ray, Python

What I’m Working On Now

Building deeper expertise in:
- C++20 concurrency
- CUDA warp-level optimization
- LLM fault-tolerant runtime systems
- Robotics simulation and GPU-accelerated perception
Publishing a series of GPU + C++ systems projects over the next 8 weeks
Preparing for ML Systems & Robotics Engineering roles

Let’s Connect!

LinkedIn: https://www.linkedin.com/in/drew-kalasky/ Open to opportunities in: ML Systems, GPU optimization, robotics simulation, inference infrastructure, applied ML engineering

Pinned Repositories

Real-Time Multi-Camera Tracker (C++17)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly