Turn non-determinism from an enemy into a tool.
Rift is a Go library that makes concurrent operations deterministic — without rewriting your code. It does this by running every operation as several parallel variants called rifts, then picking the best result using an algorithm called Extended Causal Clocks (ECC-2).
- Why Rift exists
- How it works — the big picture
- Installation
- Quick start (5 minutes)
- The ECC-2 algorithm — deep dive
- Features
- Configuration reference
- Metrics reference
- Running the CLI demos
- Running the tests
- Benchmarks
- Project structure
- Use cases
- Version history
- Roadmap
- Contributing
A Heisenbug is a bug that disappears or changes when you try to observe it. The most common cause in Go programs is a race condition — when two goroutines access shared data at the same time and the result depends on which one runs first.
// Without Rift — classic race condition
var counter int64
go func() { counter++ }()
go func() { counter++ }()
// What is counter now? 1 or 2?
// It depends on the goroutine scheduler — which you cannot control.
// Under the race detector: likely 2. In production: sometimes 1.
// This bug disappears when you add logging. It's a Heisenbug.Traditional solutions — mutexes, channels, distributed locks — all require you to redesign your code around the concurrency primitive. That is expensive, error-prone, and sometimes impossible with existing codebases.
Rift wraps your existing function and runs it as N parallel copies (called rifts). Each copy executes independently, in its own goroutine, with no shared mutable state injected at split time. When all copies finish, a deterministic fusion algorithm picks the winner based on causal priority — not based on which goroutine happened to run first.
The result is always the same, regardless of goroutine scheduling. The Heisenbug has nowhere to hide.
// With Rift — same function, deterministic result
eng, _ := rift.NewEngine(rift.DefaultConfig())
result, err := eng.Run(func() (any, error) {
counter := atomic.AddInt64(&sharedCounter, 1)
return counter, nil
})
// result.Value is always the causally-correct answer.
// No rewrite needed. No locks added to your code.Every call to eng.Run(fn) goes through three steps:
Your function
│
▼
┌─────────────┐
│ SPLIT │ Clone fn into N goroutines (RiftFactor, default: 3)
└─────────────┘
│
├──▶ Rift A ──▶ [runs fn] ──▶ result A + ECC-2 clock A ──▶ ╮
├──▶ Rift B ──▶ [runs fn] ──▶ result B + ECC-2 clock B ──▶ ──▶ ┌──────────┐
└──▶ Rift C ──▶ [runs fn] ──▶ result C + ECC-2 clock C ──▶ ╯ │ FUSE │──▶ Deterministic Result
(all parallel, no global locks, panics caught) └──────────┘
Step 1 — Split: The Splitter creates N independent Rift objects, each wrapping your function in a closure-safe copy. The RiftFactor controls how many variants run (default 3, range 2–32).
Step 2 — Execute: The Executor runs all rifts concurrently inside a bounded goroutine pool (default: 2×GOMAXPROCS). Each rift gets its own ECC-2 CausalClock before it starts. Panics inside your function are caught and converted to errors — they never crash the engine.
Step 3 — Fuse: The FusionEngine receives all completed rifts and selects the winner using ECC-2 clock comparison. The winning rift's result becomes the return value of eng.Run(). All other rifts are marked as pruned.
The key guarantee: The winner is chosen by causal priority score, not by arrival time. Two calls with the same inputs on the same machine will always produce the same winner, regardless of how the goroutine scheduler decided to order things.
go get github.com/damienos61/riftRequirements: Go 1.22 or later. Zero external dependencies — only the Go standard library.
package main
import (
"fmt"
"github.com/damienos61/rift"
)
func main() {
// DefaultConfig gives you sensible production defaults:
// RiftFactor=3, timeout=5s, fusion strategy="causal"
eng, err := rift.NewEngine(rift.DefaultConfig())
if err != nil {
panic(err)
}
// eng is safe to use from multiple goroutines simultaneously. result, err := eng.Run(func() (any, error) {
// Put any function here — DB query, HTTP call, computation, etc.
// Rift will run it 3 times in parallel and pick the best result.
return "hello from rift", nil
})
if err != nil {
fmt.Println("all variants failed:", err)
return
}
fmt.Println(result.Value) // "hello from rift"
fmt.Println(result.CausalScore) // e.g. 3.4104 — ECC-2 priority score
fmt.Println(result.EntropyDelta) // e.g. 0.12 — path diversity delta
fmt.Println(result.FusedFrom) // e.g. 7 — which rift ID won
fmt.Println(result.Retries) // 0 — retries used
}git clone https://github.com/damienos61/rift
cd rift
go run ./cmd/rift # basic demo — 8 operations, shows scores and entropy
go run ./cmd/rift -heisen # Heisenbug simulation with 200 goroutines
go run ./cmd/rift -bench # throughput benchmarkExpected output from the basic demo:
╔══════════════════════════════════════════════════════════╗
║ R I F T v1.3.0 — Causal Execution Engine ║
╚══════════════════════════════════════════════════════════╝
RiftFactor : 3 | Strategy : causal
op=1 | value=result-1 | score=3.3113 | entropy=0.1204 | retries=0
op=2 | value=result-2 | score=3.4104 | entropy=0.0891 | retries=0
...
─── Metrics ─────────────────────────────────────────────
ops: total=8 success=8 failed=0 shed=0
rifts: spawned=24 pruned=16
engine: latency=0.02ms entropy=0.1031 error_rate=0.0%
circuit: closed active_factor=3 strategy=causal
The Extended Causal Clock v2 (ECC-2) is the heart of Rift. It is the algorithm that decides which of the N parallel rifts "wins". Understanding it helps you understand why Rift is deterministic and what the numbers in the output mean.
Every rift carries a CausalClock — a data structure with 6 fields that describe the rift's causal context:
type CausalClock struct {
RiftID RiftID // unique ID of this rift variant
Lamport uint64 // logical event counter (like a Lamport timestamp)
Weight float64 // the main priority score — computed by Finalize()
WallNanos int64 // wall clock time when this rift finished (nanoseconds)
Entropy float64 // execution path diversity score in (0, 1)
Depth uint32 // how many causal events this rift observed
Generation uint32 // monotonic counter — prevents clock confusion under churn
}At the start of execution, each rift is assigned a clock via Tick(). At the end of execution, Finalize() recomputes the Weight using 5 heuristics.
Weight starts at 1.0 and is multiplied by each heuristic in sequence:
Initial Weight = 1.0
H1 — Latency bonus (sigmoid-shaped, range [0.5, 2.0])
A rift that finishes faster than the expected latency budget gets a bonus. One that finishes much slower gets a penalty. The transition is smooth (sigmoid) so there's no sharp cutoff.
ratio = latencyBudget / actualDuration
if ratio >> 1 (much faster than budget) → bonus ~2.0
if ratio == 1 (exactly on budget) → neutral ~1.0
if ratio << 1 (much slower) → penalty ~0.5
Weight *= 0.5 + 1.5 / (1 + exp(-2.2 × (ratio - 1)))
H2 — Health bonus
A rift that returned a value without error gets a significant boost. A rift that failed or panicked is heavily penalized.
if healthy: Weight *= 1.55 (+55% boost)
if failed: Weight *= 0.08 (-92% penalty)
This is the most important heuristic — it ensures that in the presence of transient failures, healthy rifts almost always win.
H3 — Lamport depth bonus (log-scale)
A rift that has a higher Lamport counter has "witnessed" more causal events in the system. More witnessed events = richer causal view = slight priority boost. The log scale prevents runaway amplification.
if Lamport > 1:
Weight *= 1.0 + log1p(Lamport) × 0.05
H4 — Entropy bonus (new in v0.9)
Entropy measures how diverse the execution path of this rift was. It is computed by mixing the rift's path hash (derived from its RiftID and Lamport counter) with a shared entropy pool that accumulates contributions from all rifts. A rift with a more unique execution path gets a small boost.
entropy = normEntropy(pathHash XOR entropyPool) -- result in (0, 1)
Weight *= 1.0 + 0.15 × entropy
The entropy pool is a cross-rift accumulator — every time a rift finishes, it adds its path hash to the pool. This means each rift's entropy score depends not just on itself but on the entire ensemble of rifts that ran before it.
H5 — Causal depth bonus (new in v0.9)
Depth tracks how many times Converge() has been called on this rift. In the current implementation, each rift converges exactly once, so Depth = 1 after execution. The bonus is small but ensures deeper causal chains are preferred.
if Depth > 0:
Weight *= 1.0 + log1p(Depth) × 0.02
Floor clamp: After all heuristics, weight is clamped to a minimum of 0.001. This prevents zero-weight rifts from causing division issues in downstream calculations. NaN and Inf values (from degenerate inputs) are also replaced with the floor.
The final CausalScore you see in result.CausalScore is computed by clock.Score():
Score = Weight × (1 + 0.15 × Entropy)
+ log1p(Lamport) × 0.01
+ log1p(Depth) × 0.02
+ 1 / (1 + WallNanos / 1e9) × 0.001
Higher is better. The score is used for logging and telemetry — the actual comparison for winner selection uses the 5-dimension total order below.
When the fusion engine compares two rifts, it uses this strict priority order:
| Priority | Dimension | Rule |
|---|---|---|
| 1st | Weight |
Higher wins. Difference must be > 1×10⁻⁹ to count. |
| 2nd | Entropy |
Higher wins. Difference must be > 1×10⁻⁹ to count. |
| 3rd | Lamport |
Higher wins. (More events witnessed.) |
| 4th | WallNanos |
Lower wins. (Finished sooner.) |
| 5th | RiftID |
Higher wins. Absolute tiebreaker. |
This is a strict total order — no two distinct clocks can ever compare equal (RiftID guarantees uniqueness). This means the fusion engine always produces the same winner, call after call, regardless of goroutine scheduling. That is the determinism guarantee.
result.EntropyDelta is the difference between the winner's entropy and the average entropy of the losing rifts:
EntropyDelta = winner.Entropy - mean(loser1.Entropy, loser2.Entropy, ...)
A positive EntropyDelta (e.g., +0.12) means the winner explored a more diverse execution path than the average of the losers — it had a richer causal view. A negative value means the winner was the "least diverse" rift but won on Weight or Lamport.
This metric is useful for monitoring: sustained negative EntropyDelta over many operations may indicate that all rifts are taking identical code paths, which reduces causal coverage.
Rift uses a single process-wide atomic Lamport counter. Every time a rift starts (Tick) or finishes (Finalize), the counter is incremented. This means:
- The counter is a total order over all events in the engine
- A rift that starts later (higher Lamport) has genuinely "seen more" of the engine's state
- The counter never resets — it is monotonically increasing for the lifetime of the process
The Generation counter (added in v1.3.0) is a separate monotonic counter that increments once per Tick() call. It is stored in the clock but is currently used as an additional uniqueness signal — it prevents ABA-style confusion if the engine processes an extremely large number of operations and RiftIDs wrap around.
The circuit breaker is a fault gate that automatically stops accepting operations when too many of them are failing. This protects downstream systems (databases, APIs) from being hammered when they are already overloaded.
States:
CLOSED ──(error rate ≥ threshold)──▶ OPEN ──(after cool-down)──▶ HALF-OPEN
▲ │
│ (probe OK) │
└────────────────────────────────────────────────────────────────────┘
(probe fails) │
▼
OPEN (again)
- CLOSED (normal): all operations pass through
- OPEN: all operations immediately return
ErrCircuitOpen— no goroutines wasted - HALF-OPEN: after the cool-down period, one probe operation is allowed through. If it succeeds, the circuit closes. If it fails, the circuit opens again.
Configuration:
cfg.CircuitBreaker = rift.CircuitBreakerConfig{
Enabled: true,
Threshold: 0.5, // open when 50% of ops in the window fail
Window: 100, // rolling window: last 100 operations
CoolDown: 10*time.Second, // wait 10s before probing
}How the window works: The circuit breaker maintains a ring buffer of the last Window operation outcomes (true=success, false=failure). After each operation, it recomputes the failure rate across the entire window. If the rate meets or exceeds Threshold, the circuit opens.
Demo:
go run ./cmd/rift -circuit═══ Circuit Breaker Demo ═══
Injecting failures to trip the breaker...
op=01 err=true circuit=closed
op=02 err=true circuit=closed
...
op=10 err=true circuit=open
op=11 err=true circuit=open ← engine stops trying
...
Post-saturation: err=true (expect circuit-open)
Waiting 2s for cool-down...
Probe OK: recovered circuit=closed ← back to normal
When a rift fails (error or panic), Rift can automatically retry it with exponential backoff before moving to fusion. Each retry uses a fresh ECC-2 clock tick, so the retry's causal timestamp correctly reflects its position in the event timeline.
cfg.Retry = rift.RetryPolicy{
MaxAttempts: 3, // try up to 3 times per rift
Backoff: 1*time.Millisecond, // start with 1ms delay
MaxBackoff: 50*time.Millisecond, // cap the delay at 50ms
}
// Actual delays: attempt 2 waits 1ms, attempt 3 waits 2ms, capped at 50msHow backoff works:
Attempt 1: run immediately
Attempt 2: wait Backoff (1ms), then run
Attempt 3: wait min(Backoff×2, MaxBackoff) = min(2ms, 50ms) = 2ms, then run
Retries are per-rift — each of the N rifts retries independently. So with RiftFactor=3 and MaxAttempts=3, up to 9 total executions of your function can occur. As soon as any rift converges successfully, it stops retrying.
result.Retries tells you the total number of retry attempts across all rifts for that operation.
The adaptive factor automatically adjusts the number of rifts based on the live error rate. When things go wrong, Rift becomes more defensive by spawning more variants. When things are healthy, it reduces overhead.
cfg.Adaptive = rift.AdaptiveConfig{
Enabled: true,
MinFactor: 2, // never go below 2 rifts (minimum for fusion to work)
MaxFactor: 8, // never go above 8 rifts
ErrorRateUp: 0.3, // if error rate > 30%, increase RiftFactor by 1
ErrorRateDown: 0.05, // if error rate < 5%, decrease RiftFactor by 1
}The decision is made over a rolling window of the last 20 operations. eng.Snapshot().ActiveRiftFactor tells you the current live value.
Demo (4 phases — healthy → degraded → recovering → healthy):
go run ./cmd/rift -adaptive Phase: healthy (fail_rate=0%)
active_factor=2 error_rate=0.0% circuit=closed
Phase: degraded (fail_rate=60%)
active_factor=6 error_rate=42.0% circuit=closed ← scaled up under stress
Phase: recovering (fail_rate=10%)
active_factor=5 error_rate=30.0% circuit=closed
Phase: healthy (fail_rate=0%)
active_factor=2 error_rate=5.0% circuit=closed ← scaled back down
When the engine is saturated — more operations arriving than it can process — the load shedder immediately rejects excess operations with ErrShed instead of letting them pile up in a queue. This is the correct behavior in high-throughput systems: a fast, explicit rejection is better than a slow timeout.
How it works: The shedder uses a token-bucket backed by a buffered channel of capacity MaxQueueLen. Each operation acquires a token before starting and releases it when done. If no token is available, the operation is shed immediately.
cfg.Shed = rift.ShedPolicy{
Enabled: true,
MaxQueueLen: 1000, // allow up to 1000 in-flight operations
Strategy: "newest", // currently: reject new arrivals when full
}Handling shed errors:
result, err := eng.Run(myFn)
if errors.Is(err, rift.ErrShed) {
// Operation was dropped — queue your retry logic here,
// or return 503 to the caller
return
}Demo:
go run ./cmd/rift -shed═══ Load-Shedding Demo (v1.3.0) ═══
Sending 500 ops into an engine with MaxQueueLen=50.
processed=50 shed=450 engine_shed_count=450
eng.Snapshot().ShedOps gives you the total count of shed operations since engine creation.
The first few operations on a new engine may be slower than normal because the Go runtime hasn't yet scheduled the goroutine pool. The warmup feature runs a configurable number of no-op operations at engine creation time to pre-heat the pool, eliminating cold-start latency spikes.
cfg.Warmup = rift.WarmupConfig{
Enabled: true,
Ops: 10, // run 10 no-op operations at startup
Timeout: 2*time.Second, // if warmup doesn't finish in 2s, fail with ErrWarmupTimeout
}
eng, err := rift.NewEngine(cfg)
if errors.Is(err, rift.ErrWarmupTimeout) {
// Warmup timed out — the system may be under severe load at startup
}This is particularly useful in Kubernetes pods where the first request arrives immediately after the container starts.
The health probe gives you a liveness and readiness check that is compatible with Kubernetes /healthz and /readyz endpoints.
h := eng.Health()
h.Live // bool — true if the engine is running (always true if you have an Engine)
h.Ready // bool — true if: (1) circuit is closed AND (2) error rate ≤ 50%
h.Reason // string — human-readable reason when Ready=false, e.g. "circuit breaker open"Rules for Ready=false:
| Condition | Reason field |
|---|---|
| Circuit is open | "circuit breaker open" |
| Circuit is half-open | "circuit breaker half-open" |
| Error rate > 50% | "error rate too high" |
Example Kubernetes handler:
http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
h := eng.Health()
if !h.Live {
http.Error(w, "engine not live", 503)
return
}
w.WriteHeader(200)
fmt.Fprintln(w, "live")
})
http.HandleFunc("/readyz", func(w http.ResponseWriter, r *http.Request) {
h := eng.Health()
if !h.Ready {
http.Error(w, h.Reason, 503)
return
}
w.WriteHeader(200)
fmt.Fprintln(w, "ready")
})Demo:
go run ./cmd/rift -health═══ Health Probe Demo (v1.3.0) ═══
Healthy engine — live=true ready=true
Stressed engine — live=true ready=false reason="circuit breaker open"
The telemetry hook lets you plug in your own tracing or metrics system without any import coupling. You implement a small interface and Rift calls your methods at key lifecycle points. All methods must be non-blocking and must not panic.
type TelemetryHook interface {
// Called once per Run() call, right after the split.
OnSplit(opID OperationID, factor int)
// Called once per rift, immediately after it finishes (success or error).
OnConverge(riftID RiftID, duration time.Duration, healthy bool)
// Called once per Run() call, after fusion selects a winner.
OnFuse(opID OperationID, winner RiftID, score float64, pruned int)
// Called when the fused result is an error.
OnError(opID OperationID, err error)
// Called when an operation is dropped by the load shedder (v1.3.0).
OnShed(opID OperationID)
}Example — connecting to your own tracing system:
type myTracer struct{}
func (t *myTracer) OnSplit(opID rift.OperationID, factor int) {
mySpan.Start(fmt.Sprintf("rift.split op=%d factor=%d", opID, factor))
}
func (t *myTracer) OnConverge(riftID rift.RiftID, d time.Duration, healthy bool) {
myMetrics.Record("rift.converge.latency_ms", d.Milliseconds())
}
func (t *myTracer) OnFuse(opID rift.OperationID, winner rift.RiftID, score float64, pruned int) {
myMetrics.Record("rift.causal_score", score)
}
func (t *myTracer) OnError(opID rift.OperationID, err error) {
myAlerts.Trigger("rift.operation_failed", err)
}
func (t *myTracer) OnShed(opID rift.OperationID) {
myMetrics.Increment("rift.shed_total")
}
cfg := rift.DefaultConfig()
cfg.Telemetry = &myTracer{}
eng, _ := rift.NewEngine(cfg)cfg := rift.DefaultConfig() // always start from defaults
// ── Core ──────────────────────────────────────────────────────────────────────
cfg.RiftFactor = 3 // Number of parallel variants [2, 32]
cfg.WorkerPoolSize = 0 // Goroutine pool size. 0 = 2×GOMAXPROCS
cfg.DefaultTimeout = 5*time.Second // Per-rift timeout. 0 = no timeout.
// FusionStrategy selects how the winner is chosen:
// "causal" (default) — ECC-2 total order (Weight, Entropy, Lamport, Wall, ID)
// "entropy" — Entropy is primary; useful for chaos testing
// "lamport" — Pure Lamport counter ordering (classic vector clock)
// "fastest" — Lowest wall time wins (non-deterministic; benchmarks only)
cfg.FusionStrategy = "causal"
// ── Circuit breaker ──────────────────────────────────────────────────────────
cfg.CircuitBreaker = rift.CircuitBreakerConfig{
Enabled: false, // disabled by default
Threshold: 0.5, // open at 50% error rate
Window: 100, // rolling window size (number of operations)
CoolDown: 10*time.Second, // half-open probe delay after opening
}
// ── Retry ────────────────────────────────────────────────────────────────────
cfg.Retry = rift.RetryPolicy{
MaxAttempts: 1, // 1 = no retry (default). Set ≥ 2 for retries.
Backoff: 1*time.Millisecond,
MaxBackoff: 50*time.Millisecond,
}
// ── Adaptive RiftFactor ──────────────────────────────────────────────────────
cfg.Adaptive = rift.AdaptiveConfig{
Enabled: false, // disabled by default
MinFactor: 2,
MaxFactor: 6,
ErrorRateUp: 0.3, // increase factor when error rate > 30%
ErrorRateDown: 0.05, // decrease factor when error rate < 5%
}
// ── Load shedding ────────────────────────────────────────────────────────────
cfg.Shed = rift.ShedPolicy{
Enabled: false, // disabled by default
MaxQueueLen: 1000, // max concurrent in-flight operations
Strategy: "newest", // "newest": reject new arrivals when full
}
// ── Warmup ───────────────────────────────────────────────────────────────────
cfg.Warmup = rift.WarmupConfig{
Enabled: false, // disabled by default
Ops: 10, // number of no-op warmup operations
Timeout: 2*time.Second, // max time allowed for warmup
}
// ── Telemetry ────────────────────────────────────────────────────────────────
cfg.Telemetry = nil // nil = no telemetry (default). Set to your hook.
eng, err := rift.NewEngine(cfg)Sentinel errors:
| Error | When returned |
|---|---|
rift.ErrNilOperation |
eng.Run(nil) |
rift.ErrInvalidRiftFactor |
RiftFactor < 2 |
rift.ErrOperationTimeout |
A rift exceeded its timeout |
rift.ErrNoRiftsConverged |
Internal error — zero rifts were created |
rift.ErrFusionConflict |
All rifts failed — result contains best-effort error |
rift.ErrCircuitOpen |
Circuit breaker is open |
rift.ErrMaxRetriesExceeded |
(internal, surfaced via ErrFusionConflict) |
rift.ErrShed |
Operation rejected by the load shedder |
rift.ErrWarmupTimeout |
Warmup did not complete within its deadline |
m := eng.Snapshot()| Field | Type | Description |
|---|---|---|
TotalOperations |
uint64 |
Total calls to Run() or RunWithTimeout() that passed the shedder and circuit breaker |
SuccessfulOps |
uint64 |
Operations that returned a non-nil value without error |
FailedOps |
uint64 |
Operations where all rifts failed (all variants errored or panicked) |
ShedOps |
uint64 |
Operations rejected by the load shedder before they were split |
TotalRiftsSpawned |
uint64 |
Total goroutines launched across all operations |
TotalRiftsPruned |
uint64 |
Rifts discarded after fusion (always TotalRiftsSpawned - TotalOperations) |
RiftFactor |
int |
The configured RiftFactor from Config |
ActiveRiftFactor |
int |
The current live RiftFactor (differs from RiftFactor when Adaptive is enabled) |
FusionStrategy |
string |
Active fusion strategy name |
MeanLatencyMs |
float64 |
Mean wall time per Run() call in milliseconds, computed over all completed operations |
MeanEntropyDelta |
float64 |
Mean EntropyDelta across all completed operations |
CircuitState |
string |
"closed" | "open" | "half-open" |
ErrorRate |
float64 |
FailedOps / TotalOperations, range [0, 1] |
All demos are in cmd/rift/main.go and are run with go run ./cmd/rift [flags].
# Basic demo — 8 operations showing score, entropy, and metrics
go run ./cmd/rift
# With a custom RiftFactor and fusion strategy
go run ./cmd/rift -factor 5 -strategy entropy
# Heisenbug simulation — 200 concurrent goroutines racing on a counter
go run ./cmd/rift -heisen
# Throughput benchmark — 20,000 operations, 8 workers
go run ./cmd/rift -bench
# Circuit breaker demo — inject failures, watch circuit open and recover
go run ./cmd/rift -circuit
# Adaptive RiftFactor demo — 4 phases of varying error rates
go run ./cmd/rift -adaptive
# Load-shedding demo — 500 ops into a MaxQueueLen=50 engine
go run ./cmd/rift -shed
# Health probe demo — healthy vs stressed engine comparison
go run ./cmd/rift -healthHFT trading example (demonstrates all features together):
go run ./examples/tradingThis runs a simulated HFT order processor with 15% transient exchange failure rate, circuit breaker, retry, adaptive factor, load shedding, and telemetry — all enabled simultaneously.
# Run all 22 tests
go test ./...
# Run with the race detector — must produce zero race warnings
go test -race ./...
# Run a specific test with verbose output
go test -run TestHeisenBugElimination -v
go test -run TestCircuitBreakerClosesAfterCoolDown -v
go test -run TestLoadSheddingRejectsWhenSaturated -v
# Run all benchmarks with memory stats
go test -bench=. -benchmem
# Run benchmarks for 5 seconds each (more stable numbers)
go test -bench=. -benchtime=5s -benchmem| Test | What it verifies |
|---|---|
TestRunBasic |
Engine returns the correct value |
TestRunPropagatesError |
Errors from your function are returned correctly |
TestRunPanicRecovery |
Panics inside your function are caught, never crash the engine |
TestRunNilFunction |
eng.Run(nil) returns ErrNilOperation |
TestCausalScorePositive |
result.CausalScore is always > 0 for healthy operations |
TestEntropyDeltaFinite |
result.EntropyDelta is never NaN or Inf |
TestHeisenBugElimination |
100 concurrent goroutines — every rift converges to a value |
TestRunWithTimeout |
Operations that exceed their timeout return ErrOperationTimeout |
TestCircuitBreakerOpens |
Circuit opens after enough failures |
TestCircuitBreakerClosesAfterCoolDown |
Circuit recovers after cool-down |
TestAdaptiveFactorBounds |
ActiveRiftFactor always stays within [MinFactor, MaxFactor] |
TestRetryPolicyConverges |
Retry eventually finds a healthy variant |
TestEntropyFusionStrategy |
The "entropy" strategy returns the correct value |
TestLoadSheddingRejectsWhenSaturated |
At least some ops are shed when pool is tiny |
TestHealthProbeHealthyEngine |
Healthy engine: Live=true, Ready=true |
TestHealthProbeStressedEngine |
Stressed engine: Live=true, Ready=false |
TestWarmupCompletes |
Engine with warmup enabled works correctly after creation |
TestCustomRiftFactor/factor=2 |
RiftFactor=2 works |
TestCustomRiftFactor/factor=4 |
RiftFactor=4 works |
TestCustomRiftFactor/factor=8 |
RiftFactor=8 works |
TestCustomRiftFactor/factor=16 |
RiftFactor=16 works |
TestRiftFactorAboveMaxRejected |
RiftFactor=1 returns an error at engine creation |
TestMetricsAccuracy |
TotalRiftsSpawned = TotalOperations × RiftFactor |
TestTelemetryHook |
OnSplit called N times, OnFuse called N times, OnConverge called N×RiftFactor times |
TestConcurrentSafety |
50 concurrent goroutines using the same engine — zero errors |
| Race detector | go test -race ./... — zero data races detected |
Measured on Intel Xeon Platinum 8581C @ 2.10GHz, Go 1.22, Linux amd64, go test -bench=. -benchtime=3s -benchmem.
| Benchmark | ops/sec | ns/op | Bytes/op | Allocs/op | Notes |
|---|---|---|---|---|---|
BenchmarkRiftFactor2 |
~95k | 10,477 | 1,736 | 29 | Minimum overhead, 2 variants |
BenchmarkRiftFactor3 |
~66k | 15,063 | 2,520 | 40 | Default config |
BenchmarkRiftFactor8 |
~25k | 39,713 | 6,440 | 95 | Max coverage |
BenchmarkEntropyFusion |
~67k | 14,975 | 2,520 | 40 | Same as factor=3, different fusion |
BenchmarkWithCircuitBreaker |
~6M | 163 | 0 | 0 | Circuit breaker overhead only |
BenchmarkNaiveDirectCall |
~1.1B | 0.91 | 0 | 0 | Raw function call — baseline |
Reading the numbers:
- Rift adds overhead proportional to
RiftFactorbecause it genuinely runs your function multiple times. For CPU-bound micro-benchmarks (likereturn 1+1) this overhead is visible and expected. - For I/O-bound operations (database queries, HTTP calls, gRPC) that typically take 1–50ms, Rift's overhead of 10–40µs is 25–5000× smaller than the operation itself — effectively zero.
- The
BenchmarkWithCircuitBreakerresult (163ns, 0 allocs) shows what happens when the circuit is closed and all overhead is just the check — it is extremely cheap.
When to use which RiftFactor:
| Scenario | Recommended RiftFactor |
|---|---|
| Minimum overhead, very stable systems | 2 |
| General production use | 3 (default) |
| High error rate environments, HFT | 4–6 |
| Maximum causal coverage, chaos testing | 8+ |
| Adaptive (auto-tunes) | start at 3, enable Adaptive |
rift/
│
├── rift.go # Public API — the only file you need to import
│ # Re-exports all types; wraps the internal engine.
│
├── rift_test.go # 22 tests + 6 benchmarks (all in package rift_test)
│
├── go.mod # Module: github.com/damienos61/rift, go 1.22
├── README.md # This file
├── CHANGELOG.md # Version history with detailed change notes
├── LICENSE # MIT
├── .gitignore
│
├── cmd/
│ └── rift/
│ └── main.go # Interactive CLI — all demos and flags
│
├── examples/
│ └── trading/
│ └── main.go # HFT order processor — all features at once
│
└── internal/ # Internal packages — not part of the public API
│
├── rift/
│ ├── types.go # All shared types: Config, Result, CausalClock,
│ │ # State, Operation, all interfaces, sentinel errors
│ └── rift.go # Rift struct + state machine + FNV-1a path hash
│
├── clock/
│ └── clock.go # ECC-2 implementation: Tick, Finalize, Compare, Score
│ # The core algorithm — see the deep dive section above
│
├── splitter/
│ └── splitter.go # Creates N Rift objects from one Operation
│ # Closure-safe capture, MaxRiftFactor=32 guard
│
├── executor/
│ └── executor.go # Runs rifts concurrently, manages the semaphore pool,
│ # handles retry backoff, catches panics
│
├── fusion/
│ └── fusion.go # 3-pass selection: filter → rank → select+prune
│ # All 4 strategies: causal, entropy, lamport, fastest
│
└── engine/
└── engine.go # Top-level orchestrator: wires everything together
# Circuit breaker, adaptive factor, shedder, metrics,
# warmup, health probe
Rift is a good fit when:
- Your code has non-deterministic behaviour under concurrency that is hard to reproduce
- You need deterministic results from concurrent operations without rewriting your code
- You want automatic fault tolerance (retry, circuit breaking) with minimal configuration
- You are building services that need Kubernetes-compatible health checks
- You want visibility into operation latency and error rates without adding external dependencies
Common patterns:
| Scenario | Rift features to enable |
|---|---|
| Database reads with occasional timeouts | Retry, CircuitBreaker |
| gRPC services under variable load | Adaptive, Shed, HealthProbe |
| HFT order processing | Retry, CircuitBreaker, Adaptive, Telemetry |
| IoT sensor stream aggregation | Adaptive, Shed |
| Chaos/fault injection testing | FusionStrategy="entropy" |
| Kubernetes sidecar or init container | Warmup, HealthProbe |
| Version | Release | Key additions |
|---|---|---|
| v0.1.0 | Apr 2026 | Rift Execution Model, ECC-1 (Lamport + Weight + Wall), basic fusion, "causal" / "lamport" / "fastest" strategies, CLI demo |
| v0.9.0 | Apr 2026 | ECC-2 (Entropy + Depth dimensions), CircuitBreaker, AdaptiveRiftFactor, RetryPolicy, TelemetryHook, "entropy" strategy, enriched Metrics |
| v1.3.0 | Apr 2026 | ShedPolicy (load shedding + ErrShed), WarmupConfig (+ ErrWarmupTimeout), Health() probe, CausalClock.Generation, MaxRiftFactor=32, deterministic FNV-1a path hash, BenchmarkWithCircuitBreaker, 22 tests, go test -race clean |
Items below are planned but not yet implemented. They are not present in v1.3.0.
- Distributed rifts — run causal variants across multiple machines, fuse over the network
- 3D rift visualiser — real-time causal graph of rifts and their clocks
- Python bindings — use Rift from Python via CGo or a gRPC bridge
- OpenTelemetry exporter — built-in OTLP export, not just the hook interface
- Rift-as-a-Service — deploy the engine on Kubernetes, expose it as a gRPC service
- Custom fusion plugin API — register your own fusion strategy at runtime
The most valuable contributions are real-world cases where Rift does or does not work. If you have a concurrent bug that Rift helped fix, or a benchmark showing how it behaves in your system, please open an issue.
For code contributions:
- Fork the repository
- Make your change
- Run
go test -race ./...— must be clean - Run
go vet ./...— must be clean - Open a pull request with a clear description of what changed and why
MIT — see LICENSE