LoCoMo Benchmark Results — NEXO Brain vs GPT-4, LLaMA-3, Gemini #3
Replies: 2 comments
-
Updated Results — NEXO Brain v0.5.0We've improved significantly since the initial benchmark:
+55% vs GPT-4. +98% from our initial score. New in v0.5.0: 768-dim embeddings (bge-base), hybrid search (vector+BM25), cross-encoder reranking, multi-query decomposition, intelligent chunking, session summaries, adaptive decay, temporal indexing. All improvements available via |
Beta Was this translation helpful? Give feedback.
-
|
Updated note: these results were measured on v0.5.0. Since then, NEXO Brain has added the Cognitive Cortex (v1.0), Knowledge Graph queries, and Smart Startup context loading — all of which should improve recall accuracy significantly. We plan to re-run the LoCoMo benchmark on v1.4+ and publish the updated numbers. If anyone wants to run it on their own setup, the benchmark script is straightforward to adapt. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We evaluated NEXO Brain on the LoCoMo benchmark (ACL 2024) — a peer-reviewed dataset that tests long-term conversation memory with 1,986 questions across 10 multi-session conversations.
Results (v0.5.0)
+55% vs GPT-4. Running entirely on CPU.
Highlights
v0.5.0 Features
Raw Data
Full results in benchmarks/locomo/results/
Beta Was this translation helpful? Give feedback.
All reactions