Chinese long-context LLM benchmark V2 with harder NIAH variants, 10 repeats, and efficiency metrics for DeepSeek, Kimi, and Qwen.
-
Updated
May 2, 2026 - Jupyter Notebook
Chinese long-context LLM benchmark V2 with harder NIAH variants, 10 repeats, and efficiency metrics for DeepSeek, Kimi, and Qwen.
LLM-friendly encoding for random identifiers (hex, base64, UUID). Built for agents, RAG, and NIAH-style retrieval.
Honest measurement of 1M-token long-context benchmarks (RULER + LongBench v2 + NIAH) on Qwen2.5-7B-1M local vs GitHub Models cloud. All zero credit card, drift-checked, reproducible.
Add a description, image, and links to the niah topic page so that developers can more easily learn about it.
To associate your repository with the niah topic, visit your repo's landing page and select "manage topics."