Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1d8d350
feat(cache): namespace-versioned cache keys for queries and pipes
taitelee Jun 10, 2026
218cad0
condensed new namespace query functions
taitelee Jun 10, 2026
289fa11
refactor(cache): cut over to namespace-versioned Get/Set/Invalidate
taitelee Jun 10, 2026
13427bc
make fix
taitelee Jun 10, 2026
ad1c0ff
Merge remote-tracking branch 'origin/main' into pipe-cache-invalidation
taitelee Jun 10, 2026
1eeb929
perf(cache): skip redundant per-scope bumps on whole-table invalidation
taitelee Jun 10, 2026
377636c
docs(changelog): add Unreleased entry for the cache cutover; encode q…
taitelee Jun 10, 2026
97bbbc9
refactor(ingest): collapse whole-table-subsumed scopes in the worker,…
taitelee Jun 10, 2026
b614ea3
merge main
taitelee Jun 10, 2026
3bae7ec
changelog
taitelee Jun 10, 2026
27ae1f1
refactor(pipes): drop declared param-type enforcement
taitelee Jun 11, 2026
038a205
fix(pipes): handle escaped backticks in query-tree identifier parsing…
taitelee Jun 14, 2026
142dc00
merge main
taitelee Jun 14, 2026
43ac5c3
fix(pipes): resolve through-view table deps via system.tables
taitelee Jun 14, 2026
b1e4bcc
feat(pipes): log dep-resolution outcome, document TTL-only fallbacks,…
taitelee Jun 16, 2026
0d7097e
fix(pipes): resolve through registry-discovered views so ingest still…
taitelee Jun 16, 2026
394420d
Merge remote-tracking branch 'origin/main' into pipe-cache-invalidation
taitelee Jun 16, 2026
90e24b4
fix(pipes): drop view/MV names from resolved deps, keeping only their…
taitelee Jun 16, 2026
13803e1
test(pipes): table-driven dep-resolution tests, document cache invali…
taitelee Jun 16, 2026
f762148
test(integration): allowlist-sanitize createTable names so subtest pu…
taitelee Jun 16, 2026
d6f447a
fix(pipes): dummy-bind array params for table-dep resolution; tighten…
taitelee Jun 16, 2026
a24be1a
feat(pipes): invalidate cached results via table dependencies resolve…
taitelee Jun 26, 2026
5727a3b
Merge remote-tracking branch 'origin/main' into pipe-cache-invalidation
taitelee Jun 26, 2026
7f13a70
fix(pipes): TTL-floor cached results with unresolvable deps (unfoldab…
taitelee Jun 26, 2026
09a37ee
Merge remote-tracking branch 'origin/main' into pipe-cache-invalidation
taitelee Jun 26, 2026
1083703
feat(pipes): meter degraded dependency resolution; add weird-identifi…
taitelee Jun 29, 2026
d6f047d
doc updates
taitelee Jun 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ The invariant index — what must stay true. Full narrative and rationale live i
10. **Active Sweeper** — purges NATS messages that are both ACKed (written to CH) and older than the gap window; SSE gap-fill uses `DeliverByStartTime`, no in-process ring buffer.
11. **Hasura-style access control: fail-closed (security)** — `policy.IsAdmin` (role == `admin_role`, **exact case-sensitive**, default `"admin"`) is the single admin check, shared by `Evaluate`/`ResolveRole`/`Validate`/the `/v1/admin` gate/`RoleAllowed`. Empty/absent role matches nothing (no `"*"` wildcard); `Validate` rejects empty role keys; a `nil` policy (deleted) denies **everyone incl. admin** — a total lockout, so bootstrap from the policy file, never an implicit admin grant. `default_role` is the one sanctioned roleless exception (`ResolveRole` maps empty → it pre-eval); `default_role == admin_role` is permitted but dev-only and loudly warned (`policy.DefaultRoleGrantsAdmin`). Preserve when touching `internal/policy` (policy twin of #13; see #159). Detail: architecture.md § `policy/`.
12. **Structured queries: column authz fail-closed (security)** — `POST /v1/query?table={table}`: typed AST validated against schema, permission-enforced, timestamp-bucketed for cache, `DefaultMaxRows` (10,000) cap. Every column reference — projection, aggregation args, `filters`, `group_by`, `order_by`, `time_range` — is authorized inside `query.Build` (the single chokepoint that enumerates them all), so no clause can skip the role's `allow_columns`/`deny_columns` check (#223). A `select_all` read by a *column-restricted* role expands to its allowed columns via `policy.AllowedProjection`, never a bare `SELECT *`; *unrestricted*/admin roles keep `SELECT *` (`policy.RestrictsColumns` decides). Omitting `columns` selects nothing (`ErrEmptyProjection` → `200 []`); `["*"]` is the literal column `*` (schema-gated, not a wildcard); a table-granted role with no readable columns fails closed (`ErrNoReadableColumns` → `403`). Structured and live-stream (`filterEventColumns`) reads share the one per-column decision `policy.IsColumnAllowed`, so column visibility can't drift. Preserve when touching `internal/query` or the structured-query handler. Detail: architecture.md § `query/`.
13. **Named query pipes: fail-closed (security)** — pre-defined SQL templates (Tinybird-style) with param binding + caching; `GET/POST /v1/pipes/{name}` sit outside `RequireAdmin`, so per-pipe `allowed_roles` is the *only* execute-path gate, via `policy.RoleAllowed`: exact allowlist membership (no `"*"`), admin always passes, empty/absent role and empty-string entries authorize nobody, and no `allowed_roles` → admin-only. Preserve and exercise via `testutil.RunRoleMatrix` / `StandardRoleMatrix` (see #159). Detail: architecture.md § `pipes/`.
13. **Named query pipes: fail-closed (security)** — pre-defined SQL templates (Tinybird-style) with param binding + caching; `GET/POST /v1/pipes/{name}` sit outside `RequireAdmin`, so per-pipe `allowed_roles` is the *only* execute-path gate, via `policy.RoleAllowed`: exact allowlist membership (no `"*"`), admin always passes, empty/absent role and empty-string entries authorize nobody, and no `allowed_roles` → admin-only. Preserve and exercise via `testutil.RunRoleMatrix` / `StandardRoleMatrix` (see #159). The pipe's SQL is **not** re-checked against the table policy (no row or column filtering) — `allowed_roles` gates execution, but the author is responsible for scoping the data in the pipe SQL and any views it reads. A pipe is **run as-is** — WaveHouse never parses its SQL and never rejects it (a write/DDL or multi-statement pipe runs on behalf of the caller's role, so the author owns scoping it). Cache invalidation derives the pipe's table dependencies from ClickHouse `EXPLAIN QUERY TREE`: a query it can't analyze (a write/DDL pipe, a missing table, an unreachable server) **over-resolves** to every base table so any write evicts it — never an under-resolution — and a resolved dependency whose version can't be reliably maintained (an unfoldable view) is **TTL-floored** (`cache.UnresolvedDepsTTLCap`) rather than trusted to version invalidation. Detail: architecture.md § `pipes/`.
14. **TypeScript SDK** — `@wavehouse/sdk`: zero-dep client, typed query builder, real-time SSE, live queries (incrementable/decomposable/poll aggregation), codegen CLI. The canonical client (see §SDK Sync).
15. **Observability invariants** — stdout always 100% (sampling is OTLP-push-only); WARN+ERROR always export at 100% (a non-configurable floor — don't expose it); gRPC OTel exporters dial lazily so an unreachable collector never blocks startup; the OTel Prometheus exporter uses a **private** `prometheus.Registry`. The OTLP endpoint/TLS/custom-CA/mTLS/headers are delegated to the OpenTelemetry SDK's standard `OTEL_EXPORTER_OTLP_*` env vars — `InitProvider` passes **no** endpoint/header options. Known gap, intentionally not patched in WaveHouse app code: the pinned gRPC logs exporter (`otlploggrpc` v0.19/v0.20) ignores the env TLS-cert vars, so a custom/private CA and mutual TLS apply to traces/metrics but **not** the logs signal (public-CA/system-roots TLS and plaintext still work for logs) — upstream bug open-telemetry/opentelemetry-go#6661. A malformed `OTEL_EXPORTER_OTLP_HEADERS` is logged and skipped by the SDK (fail-soft), not fatal. Preserve when touching the logger/sampler/provider. Detail: architecture.md § `observability/`.
16. **Bearer-token-only CORS posture (security)** — Bearer JWT on every request, no cookies/sessions; `corsMiddleware` deliberately **never** emits `Access-Control-Allow-Credentials` (not needed, and `*` + credentials is a spec violation browsers reject). `cors_allowed_origins` controls who can *read* responses, not cookie scope; CSRF protection is structural. Don't reintroduce cookie auth or `Allow-Credentials` without a design discussion — answers GitHub #29/#30. Code: `internal/api/router.go`.
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- **Named pipes invalidate their cached results on writes to the tables they read, via versioned namespaces and a refresh-time cascade** (`internal/discovery/{discovery,explain}.go`, `internal/api/pipes.go`, `internal/pipes/pipes.go`, `internal/cache/{cache,local,version_manager}.go`, `internal/chsql/nats.go` (moved from `internal/query/ident.go`), `cmd/wavehouse/main.go`, `docs/src/content/docs/{pipes.mdx,architecture.md}`, plus unit + live-ClickHouse tests): closes [#178](https://github.com/Wave-RF/WaveHouse/issues/178). A pipe's table dependencies are resolved by **asking ClickHouse** — `EXPLAIN QUERY TREE` on the bound query the first time it runs (`discovery.ResolveTables`, reading the table list off ClickHouse's own analysis), cached per bound query (parameter combination). WaveHouse never parses pipe SQL itself and never rejects a pipe. `EXPLAIN QUERY TREE` is the pre-optimization tree, so it keeps a table even when the query is metadata-answered (`SELECT count() FROM t` still tracks `t`) or reads a currently-empty table — both of which `system.query_log` and the post-optimization `EXPLAIN PLAN` would drop. It also tracks reads that aren't plain `FROM` clauses: a `joinGet()` of a `Join`-engine table and a `dictGet()` of a dictionary (mapped via `system.dictionaries` to the ClickHouse table backing it). Table-name extraction is robust to `EXPLAIN`-rendering quirks the unit fixtures didn't originally cover: a `FINAL`/`SAMPLE` modifier appends fields after the name (`table_name: db.t, final: 1`) and a string literal can itself contain the text `table_name:` — neither leaks into, nor fabricates, a tracked table — and a `dictGet` against a not-yet-loaded dictionary force-loads it first, since ClickHouse reports an empty `source` for an unloaded dict that would otherwise drop its backing table. A build-tagged differential fuzzer (`internal/discovery/fuzz_deps_test.go`, run with `-tags fuzzdeps` against a local ClickHouse) cross-checks the resolved set for hundreds of nested / `UNION` / parameter-gated queries against a construction-based ground truth. Each resolved table is folded into the cache key as its **own versioned namespace** (a per-name lookup, no per-request graph walk), so a write to it misses the key. Views and materialized views are versioned namespaces too: the schema registry resolves each view's definition (also via EXPLAIN) **once per schema change** into a *cascade* (base table → the views reading it, transitively), pushed into the cache, so a write to a base table also bumps every view over it and evicts a pipe reading the view; a view redefined in place (and the views built on it) is evicted at the next refresh. Resolution is **precise per parameter binding, and otherwise over-resolves — never under**: for a parameter-gated `UNION`, ClickHouse folds the bound predicate so a constant-false arm is pruned (`source=web` depends only on `web_events`, not `mobile_events`; resolved and cached per bound query, and emitting the `wavehouse_pipe_dep_tables_pruned_total` metric), while a query ClickHouse can't analyze (a write/DDL pipe, an unreachable server, a missing table) over-resolves to **all base tables** so any write evicts it. The pruning is conservative — an arm is dropped only when its filter is provably constant-false, so anything unproven stays tracked (over-resolve, never stale). Schema refresh is **atomic** (rebuilt only when content changes) and re-resolves pipes (`PipesHandler.ClearResolvedDeps`). Pipes run the author SQL as-is with no policy row/column filtering (the per-pipe `allowed_roles` gates execution), so every allowed caller shares one cached entry. **Unsupported, and documented as may-go-stale** (run, not blocked): write/ingest pipes, table-function pipes (`s3()`/`numbers()` — their external data isn't a tracked table), and parameterized table names (`FROM {{tbl}}`). Single-database (the configured `clickhouse.database`); a cross-database table is untracked.
- **"Durability & Storage" operations guide** (`docs/src/content/docs/durability.md` (new), `docs/src/config/sidebar.ts`, `docs/src/content/docs/reverse-proxy.mdx`, `docs/src/content/docs/configuration.mdx`, `docs/src/content/docs/deployment.md`): documents #84. A new Operations page making the embedded-JetStream durability contract explicit before the docs site publishes: a `200` from `POST /v1/ingest` means the event has been `fsync`'d to disk on the node (the server runs with `SyncAlways: true` in `internal/mq/embedded.go`), which makes the storage substrate's `fsync` tail the ingest latency floor. Covers the contract (and how it differs from JetStream's default page-cache-then-periodic-sync mode), why a slow `fsync` tail manifests as `create stream: ... context deadline exceeded` and `503` backpressure, a where-it's-cheap-vs-expensive substrate table (managed cloud block storage and PLP NVMe vs. ZFS-without-SLOG / qcow2-on-`ext4` / spinning disks), an `fio` recipe + verdict bands to measure your own storage (with the macOS `F_FULLFSYNC` honesty caveat), and the symptom checklist. Forward-references the configurable group-commit interval (`mq.sync_interval`, [#139](https://github.com/Wave-RF/WaveHouse/issues/139)) and the planned `wavehouse storage-check` preflight ([#84](https://github.com/Wave-RF/WaveHouse/issues/84)) without claiming either exists yet. Cross-linked from Configuration (Message Queue), Deployment (Persistent Storage), and the Ingest Pipeline's worker-side ack section; no code changes.
- **"Behind a reverse proxy" deployment guide** (`docs/src/content/docs/reverse-proxy.mdx` (new), `docs/src/config/sidebar.ts`, `docs/src/content/docs/deployment.md`, `docs/src/content/docs/configuration.mdx`, `docs/src/content/docs/api.md`, `internal/api/stream.go`, `internal/api/stream_test.go`): closes #241. A new Operations page for the common "WaveHouse behind nginx / Caddy / Cloudflare Tunnel" setup, since several behaviors only matter behind a proxy and weren't documented together. Covers: TLS termination (WaveHouse serves plain HTTP and manages no certs); the request-body size limits and the division of responsibility (WaveHouse ships fixed in-code memory-safety backstops — 1 MiB control / 16 MiB ingest — while the proxy is the tunable *outer* limit, so a missing/loose proxy limit can't OOM the server); Server-Sent Events buffering + idle-timeout tuning (WaveHouse sends a `: connected` comment on open plus a periodic `:` keepalive comment so quiet streams survive proxy idle timeouts, [#226](https://github.com/Wave-RF/WaveHouse/issues/226)); the `?token=` / `since` / `Last-Event-ID` forwarding streams need; `X-Forwarded-For` trust (don't expose `:8080` directly — it's honored, so a direct client could spoof it); and which health paths to expose (`/livez`/`/readyz` internal-optional, `/v1/health` must stay public). Ships full example nginx, Caddy, and Cloudflare-Tunnel configs, and is cross-linked from Deployment, Configuration, and the API reference. One small code change lands with it: the SSE endpoint (`GET /v1/stream`) now sets `X-Accel-Buffering: no` so nginx-class proxies stream events without buffering out of the box (nginx strips the header before the client sees it; Caddy/Cloudflare ignore it). The health-probe guidance is also upgraded from "optional" to a recommendation — keep the bare `/livez`/`/readyz`/`/healthz` paths internal (a public `/readyz` turns each hit into a ClickHouse `Ping`) and expose only `/v1/health` publicly.
- **Coverage publishing — a self-hosted Go coverage README badge and GitHub Code Quality PR comments** (`.github/workflows/ci.yml`, `.github/actionlint.yaml` (new), `scripts/cov/main.go`, `scripts/ci/publish-badge.sh` (new), `.testcoverage.yml`, `go.mod`/`go.sum`, `README.md`, `AGENTS.md`, `docs/src/content/docs/development.md`): closes #133, now that the repo is public. Two published surfaces, both **non-gating** — `make cov`'s thresholds stay the only merge gate. (1) **README badge**: a new `cov badge` subcommand renders a [shields.io endpoint](https://shields.io/endpoint) JSON for the merged Go total using the *exact* number `threshold.total` gates (same `.testcoverage.yml` excludes), and a new non-gating `badge` job — the sole holder of `contents:write`, running only on trusted main — publishes it to an orphan `badges` branch via `scripts/ci/publish-badge.sh`, which the README reads over `raw.githubusercontent.com` (unrestricted for a public repo). (2) **PR comments**: the `coverage` job converts the merged Go profile to Cobertura (`go tool gocover-cobertura`, a new pinned Go `tool` dependency, with `-ignore-dirs` mirroring the YAML's global excludes) and uploads it to GitHub Code Quality via `actions/upload-code-coverage` (`code-quality: write`); the `github-code-quality[bot]` posts the aggregate + per-file diff-vs-`main` comment. The upload is `continue-on-error` so this public-preview GitHub feature can never red CI, and fork PRs skip it (no `code-quality` token, per GitHub's own guard). `actionlint` doesn't recognize the preview `code-quality` permission scope yet, so a new `.github/actionlint.yaml` suppresses only that one message. Requires the repo's *Settings → Code quality* enablement for the comments to render. Full design in `.github/workflows/README.md` §"Coverage publishing".
Expand Down
47 changes: 34 additions & 13 deletions cmd/wavehouse/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,30 @@ func run() int {
bootState := api.NewBootState(nil)
refreshInterval := time.Duration(cfg.Schema.RefreshInterval) * time.Second
registry := discovery.NewSchemaRegistry(chConn, cfg.ClickHouse.Database, refreshInterval, logger)

// TODO: this is where we can switch between ristretto, redis, tiered (both), etc.
l1, err := cache.NewLocal(cfg.Cache.L1MaxCost)
if err != nil {
logger.Error("cache init", "error", err)
return 1
}
defer func() { _ = l1.Close() }()

var pipesHandler *api.PipesHandler
registry.SetOnRefresh(func(snap discovery.DependencySnapshot) {
l1.SetDependents(snap.Cascade)
if len(snap.ChangedViews) > 0 {
nss := make([]cache.Namespace, len(snap.ChangedViews))
for i, v := range snap.ChangedViews {
nss[i] = cache.Namespace{Table: v}
}
_, _ = l1.Invalidate(ctx, nss)
}
if pipesHandler != nil {
pipesHandler.ClearResolvedDeps()
}
})

if err := registry.Refresh(ctx); err != nil {
logger.Warn("schema discovery failed on boot, retrying in background", "error", err)
bootState.Set(fmt.Errorf("schema discovery: %w", err))
Expand Down Expand Up @@ -252,16 +276,6 @@ func run() int {
}
}

// L1 cache only in standalone mode.
l1, err := cache.NewLocal(cfg.Cache.L1MaxCost)
if err != nil {
logger.Error("cache init", "error", err)
return 1
}
// TODO: eventually this is where we can switch between ristretto, redis, tiered (both), etc
cache := l1
defer func() { _ = cache.Close() }()

// Policy store (NATS KV + optional file bootstrap).
policyStore, err := policy.NewStore(ctx, embeddedMQ.JetStream(), cfg.Policy.FilePath, logger)
if err != nil {
Expand Down Expand Up @@ -291,7 +305,7 @@ func run() int {
ingestCleanup, err := ingest.StartIngestWorker(
ctx,
embeddedMQ.NatsConn(),
cache,
l1,
cfg.ClickHouse.Addr,
cfg.ClickHouse.HTTPPort, // Uses 8123 by default
cfg.ClickHouse.HTTPScheme,
Expand Down Expand Up @@ -379,6 +393,13 @@ func run() int {
return 1
}

// Pipes resolve their cache dependencies through the schema registry's
// dependency tree, so wire the registry beyond NewPipesHandler's core args.
pipesHandler = api.NewPipesHandler(pipesStore, policyStore, chConn, l1, cfg.ClickHouse.QueryTimeout, logger)
pipesHandler.Registry = registry

structuredQueryHandler := api.NewStructuredQueryHandler(chConn, l1, registry, policyStore, cfg.Cache.TimestampBucketSeconds, cfg.ClickHouse.QueryTimeout, cfg.Query.DefaultMaxRows, logger)

deps := api.Dependencies{
Ingest: ingestHandler,
Query: queryHandler,
Expand All @@ -388,8 +409,8 @@ func run() int {
Schema: api.NewSchemaHandler(registry),
DLQ: dlqHandler,
Policy: api.NewPolicyHandler(policyStore),
Pipes: api.NewPipesHandler(pipesStore, policyStore, chConn, cache, cfg.ClickHouse.QueryTimeout, logger),
StructuredQuery: api.NewStructuredQueryHandler(chConn, cache, registry, policyStore, cfg.Cache.TimestampBucketSeconds, cfg.ClickHouse.QueryTimeout, cfg.Query.DefaultMaxRows, logger),
Pipes: pipesHandler,
StructuredQuery: structuredQueryHandler,
AuthMW: authMW,
PolicyStore: policyStore,
Logger: logger,
Expand Down
Loading
Loading