Skip to content
View ark2224's full-sized avatar

Block or report ark2224

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ark2224/README.md

Hi, I’m Drew 👋

Machine Learning Engineer | High-Performance Systems | C++/CUDA | Computer Vision | ML Infrastructure

I build real-time, performance-critical systems at the intersection of computer vision, GPU acceleration, robotics simulation, and ML infrastructure.
My background spans AI/ML, optimization, deep learning, GPU programming, and systems-level engineering, with a Master’s degree in AI & Machine Learning from Columbia University.

I'm currently focused on:

  • Modern C++ (C++17/20) for vision, tracking, and real-time pipelines
  • CUDA kernel optimization and GPU acceleration
  • Inference runtime systems (batching, streaming, KV-cache simulation)
  • Orchestrated ML pipelines (Airflow/Prefect, MLflow, Ray)
  • Distributed GPU workloads on Kubernetes
  • Simulation and autonomous robotics

Tech Stack

Systems & Performance

  • C++17/20, Python
  • Linux, CMake, gdb, perf
  • Multithreading, lock-free queues
  • Real-time computer vision (OpenCV, tracking)

GPU & HPC

  • CUDA, Nsight Systems / Nsight Compute
  • TensorRT, ONNX Runtime
  • Triton Inference Server
  • GPU memory optimization, fused kernels

ML & Inference Systems

  • PyTorch, TensorFlow
  • FastAPI / gRPC
  • Continuous batching, streaming inference
  • Quantization (bitsandbytes), ONNX export
  • Model deployment & benchmarking

MLOps & Cloud

  • MLflow, Airflow / Prefect
  • Docker, Kubernetes, Ray
  • AWS / GCP
  • Feature engineering & data validation

Selected Projects

Real-Time Multi-Camera Tracking Pipeline (C++17, OpenCV, Linux)

A high-performance, multi-threaded pipeline for real-time object tracking across multiple video streams.
Demonstrates C++ systems design, threading, frame pipelines, low-latency processing, and modular CV architecture.

Tech: C++17, OpenCV, pthreads, lock-free queues, CMake, Linux


CUDA Fused Vision Kernel (CUDA, Nsight)

Hand-optimized fused CUDA kernel implementing RGB→Gray, Gaussian blur, and Sobel edge detection in a single pass.
Includes CPU vs GPU benchmarks and profiling analysis.

Tech: CUDA, Nsight Systems/Compute, memory coalescing, shared memory


LLM Inference Server (Python, FastAPI, GPU)

Custom inference engine with simplified continuous batching, token streaming, and latency profiling.
Builds intuition for LLM runtime engines (vLLM, SGLang, TensorRT-LLM).

Tech: FastAPI, PyTorch, queuing, GPU inference, streaming APIs


MLOps Training Pipeline (Airflow/Prefect, MLflow, Docker)

End-to-end ML workflow with orchestration, metrics tracking, model registry, CI/CD, and Dockerized deployment.

Tech: Airflow/Prefect, MLflow, S3, Docker, GitHub Actions


GPU Job Scheduler (Kubernetes, Triton, Prometheus/Grafana)

Lightweight GPU workload scheduler distributing inference jobs across a GPU-enabled Kubernetes cluster.
Includes autoscaling, monitoring, and queue-based scheduling.

Tech: Kubernetes, Triton, Prometheus, Grafana, Ray, Python


What I’m Working On Now

  • Building deeper expertise in:
    • C++20 concurrency
    • CUDA warp-level optimization
    • LLM fault-tolerant runtime systems
    • Robotics simulation and GPU-accelerated perception
  • Publishing a series of GPU + C++ systems projects over the next 8 weeks
  • Preparing for ML Systems & Robotics Engineering roles

Let’s Connect!


Pinned Repositories

  1. Real-Time Multi-Camera Tracker (C++17)

Pinned Loading

  1. Real-Time-Multi-Camera-Object-Tracking-Pipeline Real-Time-Multi-Camera-Object-Tracking-Pipeline Public

    C++

  2. SLAM_CUDA SLAM_CUDA Public

    Cuda