docs: update benchmarks/ directory with verified v1.1.0 numbers

codenimja · codenimja · commit 1ba26df090ca · 2025-11-02T11:22:10.000-05:00
Update deprecated benchmarks directory documentation to reflect verified performance:
- SPSC: 558M micro (not 615M), ~35M realistic threaded
- MPSC: Added 15M/8.5M/5.3M for 2/4/8 producers
- Latency: 20ns p50, 31ns p99 (not 30ns p50)
- Burst: 385M (not 300M), 18% variance
- Buffer optimal: 4096 slots (not 2048)

Even though this directory is deprecated and redirects to tests/performance/,
the documentation should still show accurate verified numbers to avoid confusion.
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -16,13 +16,14 @@ We've improved our benchmarking approach:
 
 | Benchmark | What It Measures | Reference |
 |-----------|------------------|------------|
-| `benchmark_spsc_simple.nim` | Raw throughput (615M ops/sec) | Go channels |
-| `benchmark_latency.nim` | Latency distribution (30ns p50) | Tokio/Cassandra |
-| `benchmark_burst.nim` | Burst stability (300M ops/sec) | Redis |
-| `benchmark_sizes.nim` | Optimal buffer size (2048 slots) | LMAX Disruptor |
+| `benchmark_spsc_simple.nim` | Raw throughput (558M micro, ~35M realistic) | Go channels |
+| `benchmark_latency.nim` | Latency distribution (20ns p50, 31ns p99) | Tokio/Cassandra |
+| `benchmark_burst.nim` | Burst stability (385M ops/sec, 18% variance) | Redis |
+| `benchmark_sizes.nim` | Optimal buffer size (4096 slots, 557M ops/sec) | LMAX Disruptor |
 | `benchmark_stress.nim` | Maximum load (0% contention) | JMeter/Gatling |
 | `benchmark_sustained.nim` | Long-duration stability | Cassandra/ScyllaDB |
 | `benchmark_concurrent.nim` | Async overhead (512K ops/sec) | Async runtimes |
+| `benchmark_mpsc.nim` | MPSC performance (15M/8.5M/5.3M ops/sec) | JCTools MPSC |
 
 ## Quick Start
 
@@ -49,16 +50,17 @@ Benchmarks run automatically on every commit:
 
 **Latest benchmarks** (automated CI + local verification):
 
-### Simple Single-Threaded Benchmark
+### Simple Single-Threaded Benchmark (SPSC)
 Location: `tests/performance/benchmark_spsc_simple.nim`
 
 | Metric | Result |
 |--------|--------|
-| **Peak Throughput** | 600M+ ops/sec |
-| **Average Throughput** | 593M+ ops/sec |
-| **Latency** | ~1.7 ns/op |
+| **Peak Throughput (micro)** | 558M ops/sec |
+| **Average Throughput (micro)** | 551M ops/sec |
+| **Realistic Threaded** | ~35M ops/sec |
+| **Latency** | ~1.8 ns/op |
 
-**What this measures**: Raw SPSC channel performance without threading or async overhead.
+**What this measures**: Raw SPSC channel performance. Micro-benchmark shows peak potential (tight loop), realistic threaded includes OS scheduling overhead.
 
 ### Concurrent Async Benchmark
 Location: `tests/performance/benchmark_concurrent.nim`
@@ -75,9 +77,14 @@ Location: `tests/performance/benchmark_concurrent.nim`
 
 | Benchmark Type | Throughput | Use Case |
 |----------------|------------|----------|
-| **Simple (trySend/tryReceive)** | 600M+ ops/sec | Maximum performance, tight loops |
-| **Async (send/recv)** | 500K ops/sec | Convenience, async/await code |
-| **Multi-threaded** | 50M-200M ops/sec | Thread coordination overhead |
+| **SPSC micro (trySend/tryReceive)** | 558M ops/sec | Peak potential, tight loops |
+| **SPSC realistic threaded** | ~35M ops/sec | Actual multi-threaded workloads |
+| **MPSC (2 producers)** | 15M ops/sec | Multi-producer concurrent |
+| **MPSC (4 producers)** | 8.5M ops/sec | High concurrency |
+| **MPSC (8 producers)** | 5.3M ops/sec | Memory-bandwidth limited |
+| **Async (send/recv)** | 512K ops/sec | Convenience, async/await code |
+
+**Key insight**: SPSC is 3.5× faster than MPSC in realistic threaded workloads (35M vs 10M ops/sec).
 
 ### 2. Stress Tests
 
diff --git a/benchmarks/REPRODUCING.md b/benchmarks/REPRODUCING.md
@@ -6,15 +6,24 @@
 
 ## Latest Results (New Suite)
 
-**Comprehensive Benchmark Suite** - 7 industry-standard tests:
-- **Throughput**: 615M ops/sec peak
-- **Latency**: 30ns p50, 31ns p99 
-- **Burst Load**: 300M ops/sec average, 21% variance
-- **Buffer Optimization**: 2048 slots optimal, 559M ops/sec
+**Comprehensive Benchmark Suite** - 8 industry-standard tests (verified in CI):
+
+**SPSC Benchmarks:**
+- **Throughput (micro)**: 558M ops/sec peak, 551M average
+- **Throughput (realistic)**: ~35M ops/sec with thread scheduling
+- **Latency**: 20ns p50, 31ns p99, 50ns p99.9
+- **Burst Load**: 385M ops/sec average, 18% variance
+- **Buffer Optimization**: 4096 slots optimal, 557M ops/sec
 - **Stress Test**: 0% contention at 500K operations
 - **Sustained**: Stable performance over 10 seconds
 - **Async**: 512K ops/sec (shows async overhead)
 
+**MPSC Benchmarks:**
+- **2 producers**: 15M ops/sec (optimal sweet spot)
+- **4 producers**: 8.5M ops/sec (good scalability)
+- **8 producers**: 5.3M ops/sec (memory-bandwidth limited)
+- **Key finding**: SPSC is 3.5× faster in realistic threaded workloads
+
 ## Why the New Suite?
 
 The new benchmark suite follows industry best practices from: