feat(pipes): resolve table dependencies for cache invalidation#343
feat(pipes): resolve table dependencies for cache invalidation#343taitelee wants to merge 25 commits into
Conversation
… not Cache.Invalidate
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR implements dependency-aware pipe cache invalidation. ChangesPipe dependency-based cache invalidation
Sequence Diagram(s)sequenceDiagram
participant Client
participant PipesHandler
participant resolvePipeDeps
participant ClickHouse
participant SchemaRegistry
participant Cache
rect rgba(100, 149, 237, 0.5)
Note over Client,Cache: PUT /v1/pipes/{name}
Client->>PipesHandler: PUT pipe SQL
PipesHandler->>resolvePipeDeps: pipe definition
resolvePipeDeps->>ClickHouse: DummyBind SQL → EXPLAIN QUERY TREE
ClickHouse-->>resolvePipeDeps: table_name identifiers
resolvePipeDeps->>ClickHouse: system.tables as_select (view expansion)
ClickHouse-->>resolvePipeDeps: view SQL (recursive)
resolvePipeDeps->>SchemaRegistry: filterKnownTables
SchemaRegistry-->>resolvePipeDeps: resolved base table names
resolvePipeDeps-->>PipesHandler: ResolvedTables []string
PipesHandler->>Cache: store pipe with ResolvedTables
end
rect rgba(60, 179, 113, 0.5)
Note over Client,Cache: GET /v1/pipes/{name}/execute
Client->>PipesHandler: Execute request
PipesHandler->>PipesHandler: pipeDeps(q.ResolvedTables) → []Namespace
PipesHandler->>Cache: Get(sha, deps)
Cache-->>PipesHandler: HIT or MISS
alt MISS
PipesHandler->>ClickHouse: run pipe SQL
ClickHouse-->>PipesHandler: results
PipesHandler->>Cache: Set(sha, deps, results)
end
PipesHandler-->>Client: results + X-Cache header
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: d7ff65f4-c449-4c3d-a7b3-4ef586eb7bbe
📒 Files selected for processing (18)
CHANGELOG.mdcmd/wavehouse/main.gointernal/api/pipe_deps.gointernal/api/pipe_deps_test.gointernal/api/pipes.gointernal/api/structured_query.gointernal/cache/cache.gointernal/cache/cache_test.gointernal/cache/local.gointernal/cache/local_test.gointernal/cache/version_manager.gointernal/cache/version_manager_test.gointernal/ingest/worker.gointernal/ingest/worker_test.gointernal/pipes/dummybind_test.gointernal/pipes/pipes.gointernal/testutil/mocks.gotests/integration/pipe_deps_test.go
💤 Files with no reviewable changes (1)
- internal/cache/cache_test.go
📜 Review details
🧰 Additional context used
📓 Path-based instructions (7)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use Go 1.26 with strict formatting enforced by gofumpt
Use structured logging with log/slog (JSON handler)
Use Chi v5 for HTTP routing
Return errors, don't panic. Wrap with fmt.Errorf("context: %w", err)
Use package naming: lowercase, single word (or abbreviated). internal/ enforces module privacy
No global state: Dependencies are passed explicitly (constructor injection)
Comment the why, not the what. Add a comment only when the reason isn't obvious from the code; a line that matches the surrounding pattern needs none. Keep comments to 1–2 lines
DRY — one source of truth. Before adding logic, look for an existing helper, type, or constant to reuse; before duplicating a rule, factor it into one place every caller reads
Leave it neater than you found it — within reason. Fix small, safe things in passing: a stale comment, an obvious typo, a misnamed local, dead code on your path
Files:
internal/api/structured_query.gointernal/testutil/mocks.gointernal/ingest/worker.gointernal/pipes/pipes.gointernal/pipes/dummybind_test.gocmd/wavehouse/main.gotests/integration/pipe_deps_test.gointernal/ingest/worker_test.gointernal/api/pipe_deps.gointernal/cache/local.gointernal/api/pipe_deps_test.gointernal/cache/local_test.gointernal/cache/cache.gointernal/api/pipes.gointernal/cache/version_manager.gointernal/cache/version_manager_test.go
internal/api/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Chi HTTP router, JWT/JWKS middleware (from auth/), ingest/query/structured-query/SSE/schema/DLQ/policy/pipes handlers, Hub
Files:
internal/api/structured_query.gointernal/api/pipe_deps.gointernal/api/pipe_deps_test.gointernal/api/pipes.go
internal/ingest/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Ingest worker pipeline (worker.go): JetStream input → per-table batch INSERT with DLQ output. The pipeline is insert-only. Wire format EventMessage carries {table_name, scope, received_timestamp, data} and nothing else
Files:
internal/ingest/worker.gointernal/ingest/worker_test.go
internal/pipes/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Named query pipes: NamedQuery type + NATS KV store (WAVEHOUSE_PIPES) + .sql file bootstrap. Pre-defined SQL templates with param binding + caching; per-pipe allowed_roles is the only execute-path gate via policy.RoleAllowed
Files:
internal/pipes/pipes.gointernal/pipes/dummybind_test.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*_test.go: Use table-driven tests with tests := []struct{ name string; ... } and t.Run(tt.name, ...)
Use shared mocks from internal/testutil/ (MockPublisher, MockCache, MockDeduplicator, MockSubscriber) instead of creating ad-hoc mocks
Use testutil.MakeJWT(t, claims) and testutil.MakeExpiredJWT(t, claims) for auth tests
Use testutil.NewTestSchemaRegistry(tables) or discovery.NewSchemaRegistryFromMap(tables) for schema-aware tests
Use policy.NewMemoryStore(p) for in-memory policy testing without NATS
Use pipes.NewMemoryStore(queries...) for in-memory pipes testing without NATS
Use testutil.AssertJSONResponse(t, rec, status, expected) and testutil.AssertJSONContains(t, rec, status, substring) for response assertions
Files:
internal/pipes/dummybind_test.gotests/integration/pipe_deps_test.gointernal/ingest/worker_test.gointernal/api/pipe_deps_test.gointernal/cache/local_test.gointernal/cache/version_manager_test.go
tests/integration/**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
Go integration tests (//go:build integration; ClickHouse testcontainer); run via make test-integration
Files:
tests/integration/pipe_deps_test.go
internal/cache/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Implement Cache interface → LocalCache (Ristretto) + SharedCache (TBD) + TieredCache (singleflight)
Files:
internal/cache/local.gointernal/cache/local_test.gointernal/cache/cache.gointernal/cache/version_manager.gointernal/cache/version_manager_test.go
🧠 Learnings (5)
📚 Learning: 2026-06-10T15:01:09.027Z
Learnt from: EricAndrechek
Repo: Wave-RF/WaveHouse PR: 312
File: docs/src/content/docs/development.md:0-0
Timestamp: 2026-06-10T15:01:09.027Z
Learning: In this repo’s Markdown review (all .md files), do not flag capitalization/style issues for literal paths starting with ".github/" (or any substring that is a path beginning with ".github/"). Treat ".github" as the correct lowercase dotfile directory name, even when it appears inside prose or code spans; automated checks such as LanguageTool’s "(GITHUB)" rule commonly produce false positives for this literal filesystem path.
Applied to files:
CHANGELOG.md
📚 Learning: 2026-05-25T11:24:21.130Z
Learnt from: EricAndrechek
Repo: Wave-RF/WaveHouse PR: 180
File: internal/cache/local.go:0-0
Timestamp: 2026-05-25T11:24:21.130Z
Learning: In WaveHouse’s cache packages (e.g., internal/cache/local.go), it’s acceptable to define package-level `var` constants that hold immutable OpenTelemetry metric attribute sets / `metric.MeasurementOption` values (for example: `cacheL1Attrs = metric.WithAttributes(attribute.String("tier","L1"))`). Treat these as stateless, pre-allocated option values (analogous to `regexp.MustCompile(...)`), not mutable global state. When applying the AGENTS.md “no global state / constructor injection” guideline, apply it to application dependencies (e.g., Cache, Publisher, Deduplicator) rather than to these immutable OTel attribute/measurement option variables—do not flag them as constructor-injection violations.
Applied to files:
internal/cache/local.gointernal/cache/local_test.gointernal/cache/cache.gointernal/cache/version_manager.gointernal/cache/version_manager_test.go
📚 Learning: 2026-05-20T01:02:00.784Z
Learnt from: EricAndrechek
Repo: Wave-RF/WaveHouse PR: 164
File: internal/api/router_test.go:289-350
Timestamp: 2026-05-20T01:02:00.784Z
Learning: In WaveHouse’s internal API tests (files matching internal/api/**/*_test.go), follow the existing separation-of-concerns convention for testing the RequireRole middleware: inject `ContextKeyRole` directly into the request `context.Context` instead of using `testutil.MakeJWT`/JWT-driven flows. Do not refactor role-gate tests to use JWT tokens—JWT parsing and token handling are covered separately in `middleware_test.go` (the dedicated JWT parsing tests), and mixing those concerns would expand the failure surface and reduce isolation.
Applied to files:
internal/api/pipe_deps_test.go
📚 Learning: 2026-05-23T01:23:59.268Z
Learnt from: EricAndrechek
Repo: Wave-RF/WaveHouse PR: 174
File: internal/api/ingest_test.go:111-111
Timestamp: 2026-05-23T01:23:59.268Z
Learning: In WaveHouse Go tests in internal/api/**/*_test.go, use internal/testutil.AssertJSONErrorResponse(t, w) for HTTP error-path JSON assertions. Do not use (or reintroduce) package-local assertJSONErrorResponse helpers. AssertJSONErrorResponse verifies the response Content-Type is application/json, includes the X-Content-Type-Options: nosniff header, and that the JSON body contains an "error" field.
Applied to files:
internal/api/pipe_deps_test.go
📚 Learning: 2026-05-20T20:30:15.808Z
Learnt from: taitelee
Repo: Wave-RF/WaveHouse PR: 172
File: internal/api/pipes_test.go:106-118
Timestamp: 2026-05-20T20:30:15.808Z
Learning: For WaveHouse pipes authorization allowlist checks, fix the empty-role fail-open behavior by (1) removing any outer guard that prevents allowlist evaluation when the incoming `role` is `""` (e.g., don’t short-circuit with `if role != "" { ... }`), and (2) during allowlist scanning, ensure only non-empty allowlist entries can match—e.g., require `ar != "" && ar == role` (so a malformed allowlist like `["" ]` cannot grant access to an empty incoming role via `"" == ""`).
Applied to files:
internal/api/pipes.go
🔇 Additional comments (30)
internal/cache/cache.go (1)
8-32: LGTM!internal/cache/version_manager.go (3)
16-32: LGTM!
34-51: LGTM!
76-94: LGTM!internal/cache/version_manager_test.go (1)
9-72: LGTM!internal/cache/local.go (1)
31-75: LGTM!internal/cache/local_test.go (1)
12-161: LGTM!internal/api/structured_query.go (2)
129-143: LGTM!
172-174: LGTM!internal/testutil/mocks.go (1)
117-136: LGTM!internal/ingest/worker_test.go (4)
224-232: LGTM!
449-543: LGTM!
704-731: LGTM!
733-764: LGTM!internal/ingest/worker.go (1)
461-509: LGTM!internal/pipes/pipes.go (2)
29-37: LGTM!
204-243: LGTM!internal/api/pipe_deps.go (7)
28-35: LGTM!
43-56: LGTM!
64-83: LGTM!
90-103: LGTM!
134-153: LGTM!
196-205: LGTM!
179-187:⚠️ Potential issue | 🟠 Major | ⚡ Quick winIncorrect escape sequence handling for identifiers with both backslashes and backticks.
Lines 183-184 unescape ClickHouse backtick-quoted identifiers in the wrong order, causing incorrect results when an identifier contains both a literal backslash and a literal backtick. ClickHouse uses
\\for a backslash and ``` for a backtick inside backtick-quoted identifiers.Example: The ClickHouse identifier
a\b(a, backslash, backtick, b) is written in SQL as ``a\`b`` (where\→` and\`` → ````). After stripping the outer backticks, the string isa\\\b`. The current code:
- Line 183 replaces
\`` with ````:a\\\b` → `a\b`- Line 184 tries to replace
\\\\with\\: no match (only two backslashes remain)- Result:
a\\b(wrong; expecteda\b`)Sequential
ReplaceAllis unsafe here because the replacements can interact. The correct approach is to process escape sequences in a single pass:
- Iterate byte-by-byte; when
\is seen, check the next byte:
- If
\, output one\and advance two bytes- If
`, output one`and advance two bytes- Otherwise, output
\(or handle as an error)🔧 Proposed fix for correct escape handling
func unquoteIdent(s string) string { s = strings.TrimSpace(s) if len(s) >= 2 && s[0] == '`' && s[len(s)-1] == '`' { - s = s[1 : len(s)-1] - s = strings.ReplaceAll(s, "\\`", "`") - s = strings.ReplaceAll(s, "\\\\", "\\") + s = s[1 : len(s)-1] // strip outer backticks + // Unescape in a single pass to avoid interaction between \\ and \` + var out strings.Builder + for i := 0; i < len(s); i++ { + if s[i] == '\\' && i+1 < len(s) { + next := s[i+1] + if next == '\\' || next == '`' { + out.WriteByte(next) + i++ // skip the next byte (already consumed) + continue + } + } + out.WriteByte(s[i]) + } + s = out.String() } return s }> Likely an incorrect or invalid review comment.internal/api/pipes.go (3)
27-33: LGTM!
76-80: LGTM!
155-193: LGTM!cmd/wavehouse/main.go (1)
367-372: LGTM!internal/pipes/dummybind_test.go (1)
1-63: LGTM!internal/api/pipe_deps_test.go (1)
1-192: LGTM!
Code Coverage OverviewLanguages: Go GoThe overall coverage in the branch is 89%. The coverage in the branch is 90%. Show a code coverage summary of the most impacted files.
Updated |
… add MV/multi-table tests
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/pipes/pipes.go (1)
263-271:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn an array-shaped dummy for declared
arrayparameters.
ParamDef.Typedocuments"array", butdummyForTypefalls through to"x". A pipe likeWHERE id IN {{ids}}dummy-binds asIN 'x', makingEXPLAINfail and leavingResolvedTablesnil/TTL-only for a common filtered-pipe shape.Proposed fix
func dummyForType(t string) any { switch strings.ToLower(t) { case "number": return 0 case "boolean": return false + case "array": + return []any{nil} default: // "string" and anything unrecognized return "x" } }internal/api/pipes.go (1)
27-33: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftPass dependency-resolution collaborators through the constructor.
RegistryandDatabaseare required for the new invalidation behavior, but they are optional mutable fields afterNewPipesHandler, so any missed call site silently falls back to TTL-only caching. Prefer constructor args or an options struct that initializes them with the rest of the handler dependencies.As per coding guidelines, “No global state: Dependencies are passed explicitly (constructor injection)”.
Also applies to: 47-48
Source: Coding guidelines
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: bf950fb1-b7db-47a4-975e-0611d95c9907
📒 Files selected for processing (6)
CHANGELOG.mdcmd/wavehouse/main.gointernal/api/pipe_deps.gointernal/api/pipes.gointernal/pipes/pipes.gotests/integration/pipe_deps_test.go
📜 Review details
⏰ Context from checks skipped due to timeout of 300000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Coverage
- GitHub Check: Integration tests
- GitHub Check: Unit tests
- GitHub Check: E2E tests
🧰 Additional context used
📓 Path-based instructions (5)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use Go 1.26 with strict formatting enforced by gofumpt
Use structured logging with log/slog (JSON handler)
Use Chi v5 for HTTP routing
Return errors, don't panic. Wrap with fmt.Errorf("context: %w", err)
Use package naming: lowercase, single word (or abbreviated). internal/ enforces module privacy
No global state: Dependencies are passed explicitly (constructor injection)
Comment the why, not the what. Add a comment only when the reason isn't obvious from the code; a line that matches the surrounding pattern needs none. Keep comments to 1–2 lines
DRY — one source of truth. Before adding logic, look for an existing helper, type, or constant to reuse; before duplicating a rule, factor it into one place every caller reads
Leave it neater than you found it — within reason. Fix small, safe things in passing: a stale comment, an obvious typo, a misnamed local, dead code on your path
Files:
cmd/wavehouse/main.gointernal/pipes/pipes.gointernal/api/pipe_deps.gointernal/api/pipes.gotests/integration/pipe_deps_test.go
internal/pipes/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Named query pipes: NamedQuery type + NATS KV store (WAVEHOUSE_PIPES) + .sql file bootstrap. Pre-defined SQL templates with param binding + caching; per-pipe allowed_roles is the only execute-path gate via policy.RoleAllowed
Files:
internal/pipes/pipes.go
internal/api/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Chi HTTP router, JWT/JWKS middleware (from auth/), ingest/query/structured-query/SSE/schema/DLQ/policy/pipes handlers, Hub
Files:
internal/api/pipe_deps.gointernal/api/pipes.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*_test.go: Use table-driven tests with tests := []struct{ name string; ... } and t.Run(tt.name, ...)
Use shared mocks from internal/testutil/ (MockPublisher, MockCache, MockDeduplicator, MockSubscriber) instead of creating ad-hoc mocks
Use testutil.MakeJWT(t, claims) and testutil.MakeExpiredJWT(t, claims) for auth tests
Use testutil.NewTestSchemaRegistry(tables) or discovery.NewSchemaRegistryFromMap(tables) for schema-aware tests
Use policy.NewMemoryStore(p) for in-memory policy testing without NATS
Use pipes.NewMemoryStore(queries...) for in-memory pipes testing without NATS
Use testutil.AssertJSONResponse(t, rec, status, expected) and testutil.AssertJSONContains(t, rec, status, substring) for response assertions
Files:
tests/integration/pipe_deps_test.go
tests/integration/**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
Go integration tests (//go:build integration; ClickHouse testcontainer); run via make test-integration
Files:
tests/integration/pipe_deps_test.go
🧠 Learnings (2)
📚 Learning: 2026-05-20T20:30:15.808Z
Learnt from: taitelee
Repo: Wave-RF/WaveHouse PR: 172
File: internal/api/pipes_test.go:106-118
Timestamp: 2026-05-20T20:30:15.808Z
Learning: For WaveHouse pipes authorization allowlist checks, fix the empty-role fail-open behavior by (1) removing any outer guard that prevents allowlist evaluation when the incoming `role` is `""` (e.g., don’t short-circuit with `if role != "" { ... }`), and (2) during allowlist scanning, ensure only non-empty allowlist entries can match—e.g., require `ar != "" && ar == role` (so a malformed allowlist like `["" ]` cannot grant access to an empty incoming role via `"" == ""`).
Applied to files:
internal/api/pipes.go
📚 Learning: 2026-06-10T15:01:09.027Z
Learnt from: EricAndrechek
Repo: Wave-RF/WaveHouse PR: 312
File: docs/src/content/docs/development.md:0-0
Timestamp: 2026-06-10T15:01:09.027Z
Learning: In this repo’s Markdown review (all .md files), do not flag capitalization/style issues for literal paths starting with ".github/" (or any substring that is a path beginning with ".github/"). Treat ".github" as the correct lowercase dotfile directory name, even when it appears inside prose or code spans; automated checks such as LanguageTool’s "(GITHUB)" rule commonly produce false positives for this literal filesystem path.
Applied to files:
CHANGELOG.md
🔇 Additional comments (6)
internal/pipes/pipes.go (1)
29-37: LGTM!Also applies to: 42-48, 152-165, 172-225, 274-355, 381-384
internal/api/pipe_deps.go (1)
16-68: LGTM!Also applies to: 70-82, 112-297
internal/api/pipes.go (1)
39-44: LGTM!Also applies to: 77-97, 157-178, 187-225
cmd/wavehouse/main.go (2)
391-391: LGTM!
107-117: The OTLP endpoint migration is complete and correct.cfg.OTel.Addrhas been fully removed from the codebase (no references remain except a comment in the test fixture explicitly noting its absence). The config.yaml, documentation, and code consistently use the standardOTEL_EXPORTER_OTLP_ENDPOINTenv var; no config-only deployments can silently fall back to a stale field.tests/integration/pipe_deps_test.go (1)
145-250: LGTM!
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
internal/api/pipe_deps.go (1)
92-112:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRegistry-known views can still survive as pipe dependencies. The resolver expands views, but it also keeps the original ref before expansion; the new test only checks that the base table is present, so it would not catch
[base, view].
internal/api/pipe_deps.go#L92-L112: appendronly on the non-view path, andcontinueafter appending recursively expanded dependencies.tests/integration/pipe_deps_test.go#L161-L165: assert exact resolved dependencies, or at least assert the view name is absent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: db06d93b-bf91-4b2f-b8bb-2bd9b91e4f57
📒 Files selected for processing (3)
CHANGELOG.mdinternal/api/pipe_deps.gotests/integration/pipe_deps_test.go
📜 Review details
⏰ Context from checks skipped due to timeout of 300000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Coverage
- GitHub Check: E2E tests
- GitHub Check: Analyze (go)
- GitHub Check: Lint
🧰 Additional context used
📓 Path-based instructions (4)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use Go 1.26 with strict formatting enforced by gofumpt
Use structured logging with log/slog (JSON handler)
Use Chi v5 for HTTP routing
Return errors, don't panic. Wrap with fmt.Errorf("context: %w", err)
Use package naming: lowercase, single word (or abbreviated). internal/ enforces module privacy
No global state: Dependencies are passed explicitly (constructor injection)
Comment the why, not the what. Add a comment only when the reason isn't obvious from the code; a line that matches the surrounding pattern needs none. Keep comments to 1–2 lines
DRY — one source of truth. Before adding logic, look for an existing helper, type, or constant to reuse; before duplicating a rule, factor it into one place every caller reads
Leave it neater than you found it — within reason. Fix small, safe things in passing: a stale comment, an obvious typo, a misnamed local, dead code on your path
Files:
tests/integration/pipe_deps_test.gointernal/api/pipe_deps.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*_test.go: Use table-driven tests with tests := []struct{ name string; ... } and t.Run(tt.name, ...)
Use shared mocks from internal/testutil/ (MockPublisher, MockCache, MockDeduplicator, MockSubscriber) instead of creating ad-hoc mocks
Use testutil.MakeJWT(t, claims) and testutil.MakeExpiredJWT(t, claims) for auth tests
Use testutil.NewTestSchemaRegistry(tables) or discovery.NewSchemaRegistryFromMap(tables) for schema-aware tests
Use policy.NewMemoryStore(p) for in-memory policy testing without NATS
Use pipes.NewMemoryStore(queries...) for in-memory pipes testing without NATS
Use testutil.AssertJSONResponse(t, rec, status, expected) and testutil.AssertJSONContains(t, rec, status, substring) for response assertions
Files:
tests/integration/pipe_deps_test.go
tests/integration/**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
Go integration tests (//go:build integration; ClickHouse testcontainer); run via make test-integration
Files:
tests/integration/pipe_deps_test.go
internal/api/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Chi HTTP router, JWT/JWKS middleware (from auth/), ingest/query/structured-query/SSE/schema/DLQ/policy/pipes handlers, Hub
Files:
internal/api/pipe_deps.go
🧠 Learnings (1)
📚 Learning: 2026-06-10T15:01:09.027Z
Learnt from: EricAndrechek
Repo: Wave-RF/WaveHouse PR: 312
File: docs/src/content/docs/development.md:0-0
Timestamp: 2026-06-10T15:01:09.027Z
Learning: In this repo’s Markdown review (all .md files), do not flag capitalization/style issues for literal paths starting with ".github/" (or any substring that is a path beginning with ".github/"). Treat ".github" as the correct lowercase dotfile directory name, even when it appears inside prose or code spans; automated checks such as LanguageTool’s "(GITHUB)" rule commonly produce false positives for this literal filesystem path.
Applied to files:
CHANGELOG.md
🔇 Additional comments (1)
CHANGELOG.md (1)
18-18: LGTM!Also applies to: 63-64
… expanded base tables
|
📚 Docs preview is live → https://d26ce7f2-wavehouse-docs.wave-rf.workers.dev |
…nctuation can't break CREATE TABLE
… SDK type and document the resolver
|
Not sure this assertion in your summary is accurate, need to think about it some more but pretty sure the parameters could influence the table set used...
|
| // maxViewExpansions backstops view→view expansion against a pathological chain; | ||
| // the visited set already prevents cycles, this just bounds total work. | ||
| const maxViewExpansions = 32 |
There was a problem hiding this comment.
Fine, but I'd like this documented and it more clearly logged when this backstop fires so an operator setting up WaveHouse & Pipes knows this has happened and can debug their schemas to see why/what happened.
There was a problem hiding this comment.
Actually, update: I think as this is implemented now, if/when this gets hit, we return nil and throw away all the values we'd collected up until this point, is that right? Per my logic that we should still collect dependencies on a best effort to invalidate the TTL when we know we need to even if we don't have ALL the dependencies and thus can't survive LONGER than the TTL, we would need what had been collected here and not just nil.
| refs, err := explainQueryTreeTables(ctx, conn, sql) | ||
| if err != nil { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Do we not want/need error logging here? And is returning nil valid compared to an empty list or whatever? I notice the code that takes this result as input doesn't check for nil return values and just loops over it, so is this handled properly in said cases? Do we have test cases to cover this?
EricAndrechek
left a comment
There was a problem hiding this comment.
Ok, my notes:
- What did we decide about the pipes being read-only vs not, and where/how to document and enforce that if at all? I don't remember the consensus we came to but I didn't see anything about it in this PR, unless that was in another one and I'm just losing my mind?
- We need stored and bootstrapped tables/registries to re-resolve their dependencies on boot. Right now this only happens on
Putand (as commented in this review) it needs to happen on bootstrap too. But what happens in the case where the pipe isPutand resolved and saved, and then WaveHouse is restarted? Does it persist with the old dependency chain it had? What happens if the schemas changed then during that time? That's why we have the persistent schema refresh interval and an API endpoint to force a schema discovery update – should this update the pipe dependencies too? - In cases where we can't determine all the dependencies, whether it's because another database is queried, we use an external S3 resource or URL, we nest too deeply, etc – I think we should still collect all the dependencies we DO have and know of. This way, if the TTL gets set to, say, 1 minute, but a KNOWN dependency is updated, we can still invalidate immediately. The only case/reason we NEED all the dependencies to be known is when we want to outlast the TTL, which we don't currently support anyway (but should, we need to open a new issue to track this!)
- We likely eventually will want to support more than just the one database, which would break this current implementation (among other things, which is why I'm not gating the PR being accepted on this working, since support for that would require its own dedicated PR) – but it has made me realize, I think our NATS safe subject stuff, caching keys, and this dependency chain logic, will all fall apart when that is added/supported, since we don't even have a way for NATS to include the database name along with the table... We should make a two-part issue for this, to add in the support for multiple databases (and tracking their dependencies, inserting into them, etc) and to make sure caching, dependency calculations, and NATS subjects, URL param titles (like for streams), policy file configs, etc all still work with databases added on...
- This PR made me realize, do we properly apply the policy roles on pipes, or do those intentionally not apply to pipes? Regardless, I feel like the path we've chosen should be more clear and explicitly documented...
- The biggest thing: we NEED to have per-role, claim, parameter, etc caching and isolation, including potentially on the dependency path. Resolving deps once at Put() with dummy params can't express a table set that varies by parameter value. Example:
SELECT user_id, count() FROM (
SELECT user_id FROM web_events WHERE {{source}} = 'web'
UNION ALL
SELECT user_id FROM mobile_events WHERE {{source}} = 'mobile'
) GROUP BY user_id?source=web genuinely depends only on web_events; ?source=mobile only on mobile_events. The dummy-bind resolver gets neither right — it over-resolves to both tables (every entry invalidated by either write) or, if EXPLAIN folds the constant-false dummy branches, resolves to an empty set (silent TTL-only). Since the cache is keyed per (sql, params), the correct dep set is per-entry and differs by parameter.
Some viable directions: (a) resolve at cache-miss time on the real bound SQL (synchronous EXPLAIN, deps known before Set, no re-keying); (b) async system.query_log.tables after flush_interval_milliseconds (≈7.5s) — more precise and gives base-table/view expansion for free, but the flush delay outlives most TTLs, requires re-keying the version-folded cache key (deps must be known at Get time, but aren't until after execution), and adds a staleness window. Notice though that none of these solve the scope stuff, which we also need to consider then too...
| // Pipes bootstrapped here are stored without table-dependency resolution (that runs | ||
| // in the API Put handler, which holds the ClickHouse connection), so they cache | ||
| // TTL-only until next saved via the API. |
There was a problem hiding this comment.
Why? I don't like that... I think this should effectively act like a Put on bootup/start once ClickHouse is available and ready.
There was a problem hiding this comment.
Nicer than a boot-time Put now: resolution is lazy, so a .sql-bootstrapped pipe resolves on its first execution and re-resolves on every schema refresh (ClearResolvedDeps). No permanent TTL-only window. Worst case it's TTL-only until first run, then identical to an API-saved pipe. Would you still rather have pipes resolve at boot? Or is the lazy approach okay?
| case r >= 'A' && r <= 'Z': | ||
| return r + ('a' - 'A') |
There was a problem hiding this comment.
Why what is the point in this, how is it different from just including the case in the case above and a plain return r?
| // Sanitize the test name into a valid CH identifier: lowercase, and replace every | ||
| // character that isn't a letter, digit, or underscore with '_'. An allowlist (not a | ||
| // denylist of a few known separators) so punctuation in a descriptive subtest name — | ||
| // commas, parens, colons — can't leak into the CREATE TABLE and cause a syntax error. |
There was a problem hiding this comment.
I understand the desire/need for this change, but at the same time I wonder if we SHOULD even being restricting it at all... technically the identifiers can be pretty much anything as is, including backticks, slashes/backslashes, etc, so (especially given that this is for testing) it feels like it may be more productive/relevant to allow any/all characters and confirm it still all operates the same, no?
Was this change made since tests (whether old or newly added in this PR) started failing? If so, I'd like to see those cases that fail and investigate them further, as they feel like failures we should chase the root cause of and not just bandaid patch by limiting what table names can be used in our test cases.
| // maxViewExpansions backstops view→view expansion against a pathological chain; | ||
| // the visited set already prevents cycles, this just bounds total work. | ||
| const maxViewExpansions = 32 |
There was a problem hiding this comment.
Actually, update: I think as this is implemented now, if/when this gets hit, we return nil and throw away all the values we'd collected up until this point, is that right? Per my logic that we should still collect dependencies on a best effort to invalidate the TTL when we know we need to even if we don't have ALL the dependencies and thus can't survive LONGER than the TTL, we would need what had been collected here and not just nil.
…d from ClickHouse EXPLAIN QUERY TREE
…le views); correct pipe cache docs
# Conflicts: # internal/api/stream.go
Summary
Pipes were cached TTL-only because they couldn't report which tables they read, so an ingest never invalidated a stale pipe result (#178). This teaches a pipe to resolve the ingested base tables its SQL reads — through views and aliases — and version-invalidate on the same namespace-versioned path structured queries already use.
When a pipe is created or updated,
PipesHandler.Putresolves its base tables by runningEXPLAIN QUERY TREEover dummy-bound SQL (the table set depends only on the SQL, not on per-request parameter values) and keeping the names the schemaregistry knows in the configured database; the result is stored on the pipe (
ResolvedTables, server-owned — never trusted from client input).Executefolds those tables into the cache key as whole-table namespaces, encoded exactly as the ingest worker encodes them, so an ingest into any of them invalidates the cached result. Resolution is best-effort and bounded: with no registry/ClickHouse wired or on any failure, the pipe falls back to the previous TTL-only behavior,Related Issues
Closes #178