Skip to content
View Siddhesh2377's full-sized avatar
🪨
Eating Stones
🪨
Eating Stones

Block or report Siddhesh2377

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Siddhesh2377/README.md

Siddhesh Sonar

On-Device AI / Edge Inference Engineer

I make large AI models run fast on mobile hardware. I work across the full stack — from tensor-level optimizations in GGML/C++, through JNI bindings, to production Android apps. Currently building on-device inference infrastructure at RunAnywhere (YC W26).


What I've Built

ToolNeuron — Production offline AI ecosystem for Android. 500+ commits. Native C++ inference via llama.cpp with custom JNI bindings. Plugin sandboxing with hardware-backed encryption (Android KeyStore). GGUF model management, runtime model switching, offline TTS (Sherpa-ONNX), and OTA updates. 2K+ Play Store installs.

Ai-Systems-New — The native C/C++ inference engine powering ToolNeuron. Direct GGML integration, custom tensor operations optimized for mobile SOCs.

ForgeAI — Toolkit for SafeTensors and GGUF model operations — inspection, conversion, and manipulation.

N1 — Experimental self-rewriting neural architecture using local error signals. No backpropagation. Runtime weight mutation based on surprise signals.


What I Know (Deeply, Not Surface Level)

Inference on constrained hardware — GGML internals, compute graph construction for new model architectures, ML op scheduling across CPU/GPU/NPU, quantization scheme behavior on real devices (Q4_K_M, Q5_K_S, Q8_0).

Mobile SOC architectures — Qualcomm Hexagon DSP (HVX vector extensions), Adreno GPU compute pipelines (Vulkan, timeline semaphores), QNN SDK for NPU graph compilation, ARM CPU architecture differences across Android devices.

Production Android — NDK/JNI, Jetpack Compose, plugin SDK design, secure IPC, encrypted inference pipelines, AOSP-level optimizations.


Currently

  • Building cross-platform on-device inference infrastructure at RunAnywhere (YC W26)
  • Deepening formal math foundations (linear algebra, quantization theory)
  • Maintaining and shipping ToolNeuron updates

Contact

siddheshsonar2377@gmail.com · LinkedIn

Open to full-time roles in edge AI, on-device inference, and mobile AI infrastructure.

Pinned Loading

  1. ToolNeuron ToolNeuron Public

    Complete offline AI ecosystem for Android: Chat (GGUF/LLMs), Images (Stable Diffusion 1.5), Voice (TTS/STT), and Knowledge (RAG Data-Packs), zero subscriptions, no data harvesting. Open-source priv…

    Kotlin 253 20

  2. Ai-Systems-New Ai-Systems-New Public

    On-device AI SDK powering ToolNeuron — LLM chat & tool calling (llama.cpp), Stable Diffusion image generation (QNN/MNN), image processing (upscale, segment, inpaint, depth, style), and TTS. Native …

    C++ 6

  3. structured-prompt-builder structured-prompt-builder Public

    A lightweight, browser‑first tool for designing well‑structured AI prompts with a clean UI, live previews, a local Prompt Library, and optional Gemini‑powered prompt optimization.

    HTML 312 35

  4. Web-Plugin Web-Plugin Public

    Kotlin 1

  5. HRM HRM Public

    Enhanced Hierarchical Reasoning Model (HRM) An AI system extending HRM with scalable memory, JSON metadata, and tag-based retrieval for layered, context-aware reasoning and persistent knowledge int…

    Kotlin 11 1

  6. Canves Canves Public

    Advance UI designing Canves For Android

    Java 3 1