Skip to content

perf(query): slim psl_query_frame + vectorize suffix_extract (P4, PSLR-izwfypbu)#60

Merged
bart-turczynski merged 1 commit into
mainfrom
feature/pslr-p4-slim-queryframe
Jul 2, 2026
Merged

perf(query): slim psl_query_frame + vectorize suffix_extract (P4, PSLR-izwfypbu)#60
bart-turczynski merged 1 commit into
mainfrom
feature/pslr-p4-slim-queryframe

Conversation

@bart-turczynski

Copy link
Copy Markdown
Owner

P4 of the columnar hot-path epic (PSLR-bzqvsatk). Internal-only; oracle byte-identical.

  • psl_query_framepsl_query_cols returns a plain column list; accessors read their column directly, only suffix_extract/public_suffix_rule build a data.frame once at the end (same order/types/row.names). Kills the ~0.1–0.2 ms per-call data.frame().
  • suffix_extract() — deleted the per-row strsplit loop; registrant/subdomain sliced from the core via P2/P3 offsets (registrant = substr(core, rd_start, ps_start-2), subdomain = ifelse(rd_start>1, substr(core, 1, rd_start-2), "") — reproduces the old cut>1 rule).
  • No scalar fast path added: warm scalar registrable_domain() = 0.0491 ms (≤0.05 ms target met); cold-scalar 0.421→0.124 ms.

Oracle + extract + query suites byte-identical. R CMD check 0/0/0; FAIL 0 | PASS 477; lint 0.

🤖 Generated with Claude Code

…extract (PSLR-izwfypbu)

P4 of the columnar hot-path epic (PSLR-bzqvsatk).

- R/query.R: psl_query_frame -> psl_query_cols returns a plain list of columns.
  The length-preserving accessors (public_suffix / registrable_domain /
  is_public_suffix) read their column directly; only suffix_extract /
  public_suffix_rule build a data.frame once at the end (same column order/
  types/row.names). Removes the ~0.1-0.2 ms per-call data.frame construction.
- suffix_extract(): deleted the per-row strsplit loop. Registrant label and
  subdomain are sliced from the canonical core via the P2/P3 offsets:
  registrant = substr(core, rd_start, ps_start - 2); subdomain =
  ifelse(rd_start > 1, substr(core, 1, rd_start - 2), "") -- reproducing the
  old cut>1 rule exactly.
- is_ipv4_literal left unchanged (unique-hosts-only, not a measured hotspot).
- No scalar fast path: warm scalar registrable_domain() = 0.0491 ms/call
  (<=0.05 ms target already met; unique()/match() negligible vs C++ match +
  host_normalize). Cold-scalar 0.421 -> 0.124 ms.

P1 oracle + extract + query frame-shape suites byte-identical green. R CMD
check 0/0/0; devtools::test() FAIL 0 / PASS 477; lint 0.

NEWS: Internal bullet added under the existing dev section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bart-turczynski bart-turczynski merged commit 4a975db into main Jul 2, 2026
@bart-turczynski bart-turczynski deleted the feature/pslr-p4-slim-queryframe branch July 2, 2026 16:19
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants