Skip to content

perf(cache): columnar session cache — kill psl_result_field vapply x6 (P3, PSLR-ffdsymhk)#59

Merged
bart-turczynski merged 1 commit into
mainfrom
feature/pslr-p3-columnar-cache
Jul 2, 2026
Merged

perf(cache): columnar session cache — kill psl_result_field vapply x6 (P3, PSLR-ffdsymhk)#59
bart-turczynski merged 1 commit into
mainfrom
feature/pslr-p3-columnar-cache

Conversation

@bart-turczynski

Copy link
Copy Markdown
Owner

P3 of the columnar hot-path epic (PSLR-bzqvsatk). Internal-only; oracle byte-identical.

The cache stored one R list per host and rebuilt a data.frame via psl_result_field()'s vapply×6 on every call — 1.2M closures/query at 200k hosts, why warm barely beat cold.

  • R/cache.R — columnar storage: key→integer-index env + parallel column vectors (incl. P2 offsets ps_start/rd_start). Doubling growth; psl_cache_clear() resets index env + all columns atomically (s7.4); capacity full-flush semantics unchanged.
  • R/matcher.R — hits via one mget of indices + vectorized subsetting (no per-host closures); psl_result_field() deleted; offsets returned for P4.

PRD s8.2 key semantics unchanged. Oracle + cache + list-activation suites green. R CMD check 0/0/0; FAIL 0 | PASS 477; lint 0. Warm bench 3.29s→1.69s, cold 4.26s→2.12s.

🤖 Generated with Claude Code

…(PSLR-ffdsymhk)

P3 of the columnar hot-path epic (PSLR-bzqvsatk). The session cache stored one
R list per host and reassembled a data.frame via psl_result_field()'s vapply x6
on every call (hit or miss) — 1.2M closures/query at 200k hosts, why warm
barely beat cold.

- R/cache.R: cache value shape is now a key->integer-index env plus parallel
  column vectors (public_suffix, registrable_domain, rule, kind, rule_section,
  ps_depth, and the P2 byte offsets ps_start/rd_start). Doubling growth
  (length<- preserves + NA-pads); psl_cache_clear() resets the index env and
  every column atomically (PRD s7.4); capacity full-flush semantics unchanged
  (bound 50000L, batch > capacity matched-but-not-cached).
- R/matcher.R: psl_resolve_cores() resolves hits via one mget of indices +
  vectorized column subsetting (no per-host closures), derives misses, appends
  them, and match()es back to per-input; returns a plain list of columns.
  psl_result_field() deleted. Offsets returned for P4.

PRD s8.2 key semantics unchanged (key = list-identity|section|host; unknown/
output post-cache). P1 oracle byte-identical; cache + list-activation suites
green. R CMD check 0/0/0; devtools::test() FAIL 0 / PASS 477; lint 0.
Warm-path bench 3.29s -> 1.69s, cold 4.26s -> 2.12s (200k hosts).

NEWS: Internal bullet added under the existing dev section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bart-turczynski bart-turczynski merged commit 04d28e4 into main Jul 2, 2026
0 of 3 checks passed
@bart-turczynski bart-turczynski deleted the feature/pslr-p3-columnar-cache branch July 2, 2026 16:09
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.47059% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
R/matcher.R 94.54% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants