Skip to content

Snakkaz/turbomem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TurboMem 🧠⚡

Extreme AI compression meets perfect memory — locally, for free.

An open experiment combining two bleeding-edge ideas:

  1. TurboQuant/PolarQuant — Google Research's optimal vector quantization for AI models (paper, ICLR 2026)
  2. MemPalace — The highest-scoring AI memory system ever benchmarked (96.6% LongMemEval, repo)

The goal: run powerful AI models on cheap hardware with perfect long-term memory. No cloud. No subscription. Your machine, your data.

The Vision

Right now, local AI has two problems:

  • Models are too big — A good model needs 16-48GB VRAM. Most people don't have that.
  • AI forgets everything — Every session starts from zero. Months of context, gone.

TurboQuant solves the first. MemPalace solves the second. Nobody has combined them yet.

What's Here

turbomem/
├── README.md              ← You are here
├── research/
│   ├── turboquant.md      ← Paper analysis + implementation notes
│   └── mempalace.md       ← Architecture analysis + integration plan
├── src/
│   ├── polarquant/        ← PolarQuant implementation (experimental)
│   │   ├── __init__.py
│   │   ├── rotation.py    ← Random rotation matrices
│   │   ├── quantizer.py   ← Optimal scalar quantization
│   │   └── polar.py       ← Full PolarQuant pipeline
│   ├── qjl/               ← QJL 1-bit error correction
│   │   ├── __init__.py
│   │   └── qjl.py         ← Johnson-Lindenstrauss transform
│   └── integration/       ← MemPalace + TurboQuant glue
│       └── __init__.py
├── benchmarks/            ← Reproducible benchmark scripts
│   └── README.md
├── examples/              ← Quick-start examples
│   └── basic_usage.py
├── requirements.txt
└── LICENSE

Status

Component Status Notes
PolarQuant (random rotation + quantization) 🟡 Experimental Based on paper, not Google's impl
QJL (1-bit error correction) 🟡 Experimental Core algorithm implemented
MemPalace integration 🔴 Planned Waiting for stable MCP interface
Ollama/llama.cpp bridge 🔴 Planned KV-cache compression hook
Benchmarks 🔴 Planned Need LongMemEval + perplexity tests

Can You Help?

This is an open challenge. We're looking for people who can:

  • Implement — Turn the TurboQuant paper into working code
  • Benchmark — Test compression vs accuracy on real models
  • Integrate — Wire MemPalace's ChromaDB into the quantized pipeline
  • Break it — Find edge cases, prove us wrong, make it better

Priority Challenges

  1. 🔥 PolarQuant rotation matrix — Implement the random orthogonal rotation that makes vectors quantization-friendly. The paper describes it, but optimal rotation selection is non-trivial.

  2. 🔥 QJL estimator — The 1-bit error correction needs a careful estimator that balances high-precision queries with low-precision data. Math-heavy, high impact.

  3. 🔥 KV-cache hook — Where do you intercept llama.cpp's KV-cache to inject TurboQuant compression? This is the integration challenge.

  4. MemPalace MCP bridge — Connect MemPalace's memory retrieval to a quantized local model so the model remembers everything while running on minimal hardware.

Quick Start

git clone https://github.com/Snakkaz/turbomem.git
cd turbomem
pip install -r requirements.txt
python examples/basic_usage.py

Research References

Why "TurboMem"?

TurboQuant compression + MemPalace memory = TurboMem. Fast models that never forget.

License

MIT — use it, fork it, improve it.


Built by Petersen Digital Consulting 🇳🇴 — Making local AI accessible.

Releases

No releases published

Packages

 
 
 

Contributors

Languages