Run large language models like Qwen and LLaMA locally on Android for offline, private, real-time question answering and chat - powered by ONNX Runtime.
-
Updated
Jan 7, 2026 - Kotlin
Run large language models like Qwen and LLaMA locally on Android for offline, private, real-time question answering and chat - powered by ONNX Runtime.
run llms and slms on your hardware & browser
A Cloud-to-Edge MLOps pipeline for offline industrial diagnostics. Fine-tunes Phi-3-mini (3.8B) on Cloud GPUs via QLoRA, quantizes to INT4, and deploys as a CPU-optimized ONNX microservice for Vaisala sensor logs.
Real-time semantic audio codec achieving 300bps bandwidth via gen AI reconstruction.
Edge-AI powered Data-to-Text system that analyzes global fishery trends (Capture, Aquaculture, Stocks) and generates automated status reports offline using LSTM & TensorFlow Lite.
A comprehensive toolkit for streamlining and simplifying the offline inference process for LLMs across various models and libraries.
Offline CrowdAware system for Raspberry Pi 4B and Heltec LoRa V3 using Raspberry Pi Camera Module 3 and MLX90640 Thermal Camera.
Мультимодальная офлайновая система детекции контрафакта (текст+изображение+таблица).
GPT-OSS B20 Local Execution. Lightweight local environment for running it with Python 3.12 and CUDA acceleration. - Run GPT-OSS B20 entirely offline - Optimize text generation with GPU - Enable fast, secure inference on consumer hardware.
Inclusive hand gesture recognition system for assistive human–computer interaction, based on classical machine learning and MediaPipe Hands.
Add a description, image, and links to the offline-inference topic page so that developers can more easily learn about it.
To associate your repository with the offline-inference topic, visit your repo's landing page and select "manage topics."