You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rust KV-cache compression for LLM inference. Implements TurboQuant (Zandieh et al., ICLR 2026) plus PQO — our variant that drops QJL, adds a fused CUDA kernel, and shrinks the cache to ~20% of FP16 (49% total VRAM at 32K). mistral.rs integration.
AI-powered document organiser. Extracts text and/or sorts documents: Drop in a bunch of PDFs, DOCX files, or ebooks, and it extracts Document Text, identifies Title, Author, and Year, with a local or remote LLM, and moves them into folders, and/or keeps the extracted text.