Add redb storage backend with time travel support by lawless-m · Pull Request #308 · cozodb/cozo

lawless-m · 2026-04-12T07:51:54Z

Summary

Adds a new pure-Rust storage engine using redb 4.0, gated behind a storage-redb feature flag. Single-file, ACID, zero C dependencies — fits the same niche as sled but with a more active upstream.

Implements Storage and StoreTx traits on top of redb's Database/WriteTransaction.
Time travel supported via the check_key_for_validity seek pattern used by the memory backend.
Second commit adds two optimizations:
- range_scan streams redb's 'static Range iterator instead of collecting into a Vec
- del_range_from_persisted uses retain_in in a single pass instead of collect-then-delete

Test plan

cargo check -p cozo --features storage-redb — clean
Full cargo test -p cozo --features storage-redb run on reviewer's side
Spot-check time travel semantics against the sled and rocksdb backends

Pure Rust, single-file, ACID storage engine using redb 4.0. Implements Storage and StoreTx traits behind the storage-redb feature flag. Supports time travel via check_key_for_validity seek pattern. Uses collect-into-Vec for range scans and seek-per-step for skip scans. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Stream range_scan on read path using redb's 'static Range iterator instead of collecting into Vec (saves allocation, improves scan perf) - Use retain_in for del_range_from_persisted instead of collect-keys-then-delete loop (single pass) - Remove unused collect_range helper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lawless-m · 2026-04-12T10:03:02Z

Benchmark results — redb vs sqlite and reference backends

Ran a reproducible backend comparison on branch bench/compare-backends. cozo-core/benches/time_travel.rs is parameterized over COZO_TEST_DB_ENGINE / COZO_BENCH_TT_BASE / COZO_BENCH_TT_MAX_K. Scale: BASE=10000, MAX_K=100 — up to 1M rows per temporal relation. All runs cgroup-capped to be safe.

For a graph database, reads are the top-line metric — traversals, reachability, aggregations, and time-travel snapshots all pound the read path. Writes are bursty batches that can be amortized.

redb vs sqlite (head-to-head at 1M rows)

Metric	redb	sqlite	winner
plain read QPS	122,168	82,048	redb 1.49×
tt100 point read QPS	44,335	30,895	redb 1.44×
tt100 @now read QPS	88,157	60,207	redb 1.46×
plain aggregation	2.53 ms	3.38 ms	redb 1.34×
tt100 stupid aggr	510 ms	733 ms	redb 1.44×
tt100 travel aggr	13.9 ms	32.7 ms	redb 2.35×
plain insert (10k)	17.3 ms	13.4 ms	sqlite 1.29×
tt10 insert (100k)	190.9 ms	135.2 ms	sqlite 1.41×
tt100 insert (1M)	1,772 ms	1,669 ms	sqlite 1.06×

Headlines

redb wins every read and aggregation workload by 32–49%.
Time-travel aggregation over 1M rows is 2.35× faster on redb (13.9 ms vs 32.7 ms). That's the access pattern a knowledge graph hits for "current view of everything" queries — exactly where a time-travel-aware engine earns its keep.
The sqlite write advantage amortizes with batch size. ~40% faster at 100k rows, only 6% faster at 1M rows. The fsync cost is paid once per transaction, so larger bulk loads should close the gap entirely — I'd expect it to flip at 10M+.
redb tracks mem within 4% on plain reads (122k vs 128k QPS). The mmap'd B-tree is effectively free on the read path.

Full reference table

Backend	plain read QPS	tt100 @now QPS	tt100 travel aggr	1M-row insert
mem	127,607	100,738	10.5 ms	0.77 s
redb	122,168	88,157	13.9 ms	1.77 s
sqlite	82,048	60,207	32.7 ms	1.67 s
rocksdb	52,141	39,892	29.8 ms	5.34 s
sled	18,346	75,874	27.6 ms	10.14 s

Notes on the reference backends:

rocksdb is ~2× slower than sqlite on reads at this single-box, sub-10M-row scale. Its LSM/multi-TB advantages don't apply to embedded graph workloads.
sled's plain-read QPS regressed 6× going from 10k rows to 1M rows, on top of the known transaction-size pathology (an uncapped earlier run with MAX_K=1000 OOM'd and locked the machine on the 10M-row tt1000 insert). sled is currently not a sensible recommendation for cozo users; happy to open a separate issue if useful.

Reproducing

systemd-run --user --scope -p MemoryMax=12G -p MemorySwapMax=0 -- env \
  COZO_TEST_DB_ENGINE=redb \
  COZO_BENCH_TT_DIR=/var/tmp/cozo-bench \
  COZO_BENCH_TT_BASE=10000 \
  COZO_BENCH_TT_MAX_K=100 \
  cargo +nightly bench -p cozo --features storage-redb --bench time_travel -- --nocapture

Swap COZO_TEST_DB_ENGINE for mem / sqlite / rocksdb / sled and add the corresponding storage-* feature. Per-backend logs and a parsed BENCHMARKS.md are on bench/compare-backends.

Downstream usage

This backend is already in production use by Flowstone, a Rust knowledge-graph tool that path-deps this fork with features = ["storage-redb", "graph-algo"]. Flowstone exists because redb exists — when looking for a single-file pure-Rust ACID graph engine, sqlite was the only prior option and didn't fit. #308 filled the gap.

lawless-m and others added 2 commits April 12, 2026 08:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add redb storage backend with time travel support#308

Add redb storage backend with time travel support#308
lawless-m wants to merge 2 commits intocozodb:mainfrom
lawless-m:pr/redb

lawless-m commented Apr 12, 2026

Uh oh!

lawless-m commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lawless-m commented Apr 12, 2026

Summary

Test plan

Uh oh!

lawless-m commented Apr 12, 2026

Benchmark results — redb vs sqlite and reference backends

redb vs sqlite (head-to-head at 1M rows)

Headlines

Full reference table

Reproducing

Downstream usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant