Machine Learning Engineer | High-Performance Systems | C++/CUDA | Computer Vision | ML Infrastructure
I build real-time, performance-critical systems at the intersection of computer vision, GPU acceleration, robotics simulation, and ML infrastructure.
My background spans AI/ML, optimization, deep learning, GPU programming, and systems-level engineering, with a Master’s degree in AI & Machine Learning from Columbia University.
I'm currently focused on:
- Modern C++ (C++17/20) for vision, tracking, and real-time pipelines
- CUDA kernel optimization and GPU acceleration
- Inference runtime systems (batching, streaming, KV-cache simulation)
- Orchestrated ML pipelines (Airflow/Prefect, MLflow, Ray)
- Distributed GPU workloads on Kubernetes
- Simulation and autonomous robotics
- C++17/20, Python
- Linux, CMake, gdb, perf
- Multithreading, lock-free queues
- Real-time computer vision (OpenCV, tracking)
- CUDA, Nsight Systems / Nsight Compute
- TensorRT, ONNX Runtime
- Triton Inference Server
- GPU memory optimization, fused kernels
- PyTorch, TensorFlow
- FastAPI / gRPC
- Continuous batching, streaming inference
- Quantization (bitsandbytes), ONNX export
- Model deployment & benchmarking
- MLflow, Airflow / Prefect
- Docker, Kubernetes, Ray
- AWS / GCP
- Feature engineering & data validation
A high-performance, multi-threaded pipeline for real-time object tracking across multiple video streams.
Demonstrates C++ systems design, threading, frame pipelines, low-latency processing, and modular CV architecture.
Tech: C++17, OpenCV, pthreads, lock-free queues, CMake, Linux
Hand-optimized fused CUDA kernel implementing RGB→Gray, Gaussian blur, and Sobel edge detection in a single pass.
Includes CPU vs GPU benchmarks and profiling analysis.
Tech: CUDA, Nsight Systems/Compute, memory coalescing, shared memory
Custom inference engine with simplified continuous batching, token streaming, and latency profiling.
Builds intuition for LLM runtime engines (vLLM, SGLang, TensorRT-LLM).
Tech: FastAPI, PyTorch, queuing, GPU inference, streaming APIs
End-to-end ML workflow with orchestration, metrics tracking, model registry, CI/CD, and Dockerized deployment.
Tech: Airflow/Prefect, MLflow, S3, Docker, GitHub Actions
Lightweight GPU workload scheduler distributing inference jobs across a GPU-enabled Kubernetes cluster.
Includes autoscaling, monitoring, and queue-based scheduling.
Tech: Kubernetes, Triton, Prometheus, Grafana, Ray, Python
- Building deeper expertise in:
- C++20 concurrency
- CUDA warp-level optimization
- LLM fault-tolerant runtime systems
- Robotics simulation and GPU-accelerated perception
- Publishing a series of GPU + C++ systems projects over the next 8 weeks
- Preparing for ML Systems & Robotics Engineering roles
- LinkedIn: https://www.linkedin.com/in/drew-kalasky/ Open to opportunities in: ML Systems, GPU optimization, robotics simulation, inference infrastructure, applied ML engineering
- Real-Time Multi-Camera Tracker (C++17)