2.24x decode TPS increase On Qwen 3.6 27B @ temp 0.6 | Native MTP Speculative Decoding On Apple Silicon With No External Drafter.
-
Updated
May 9, 2026 - Python
2.24x decode TPS increase On Qwen 3.6 27B @ temp 0.6 | Native MTP Speculative Decoding On Apple Silicon With No External Drafter.
ToyLLM: Learning LLM from Scratch
Implementation of Speculative Sampling in "Accelerating Large Language Model Decoding with Speculative Sampling"
efficient speculative sampling for language models
Add a description, image, and links to the speculative-sampling topic page so that developers can more easily learn about it.
To associate your repository with the speculative-sampling topic, visit your repo's landing page and select "manage topics."