On-Device AI / Edge Inference Engineer
I make large AI models run fast on mobile hardware. I work across the full stack — from tensor-level optimizations in GGML/C++, through JNI bindings, to production Android apps. Currently building on-device inference infrastructure at RunAnywhere (YC W26).
ToolNeuron — Production offline AI ecosystem for Android. 500+ commits. Native C++ inference via llama.cpp with custom JNI bindings. Plugin sandboxing with hardware-backed encryption (Android KeyStore). GGUF model management, runtime model switching, offline TTS (Sherpa-ONNX), and OTA updates. 2K+ Play Store installs.
Ai-Systems-New — The native C/C++ inference engine powering ToolNeuron. Direct GGML integration, custom tensor operations optimized for mobile SOCs.
ForgeAI — Toolkit for SafeTensors and GGUF model operations — inspection, conversion, and manipulation.
N1 — Experimental self-rewriting neural architecture using local error signals. No backpropagation. Runtime weight mutation based on surprise signals.
Inference on constrained hardware — GGML internals, compute graph construction for new model architectures, ML op scheduling across CPU/GPU/NPU, quantization scheme behavior on real devices (Q4_K_M, Q5_K_S, Q8_0).
Mobile SOC architectures — Qualcomm Hexagon DSP (HVX vector extensions), Adreno GPU compute pipelines (Vulkan, timeline semaphores), QNN SDK for NPU graph compilation, ARM CPU architecture differences across Android devices.
Production Android — NDK/JNI, Jetpack Compose, plugin SDK design, secure IPC, encrypted inference pipelines, AOSP-level optimizations.
- Building cross-platform on-device inference infrastructure at RunAnywhere (YC W26)
- Deepening formal math foundations (linear algebra, quantization theory)
- Maintaining and shipping ToolNeuron updates
siddheshsonar2377@gmail.com · LinkedIn
Open to full-time roles in edge AI, on-device inference, and mobile AI infrastructure.





