Skip to content

Comments

Add LeadingStrings benchmarks for binary and non-ASCII regex patterns#5126

Open
danmoseley wants to merge 2 commits intodotnet:mainfrom
danmoseley:regex-redux/benchmarks
Open

Add LeadingStrings benchmarks for binary and non-ASCII regex patterns#5126
danmoseley wants to merge 2 commits intodotnet:mainfrom
danmoseley:regex-redux/benchmarks

Conversation

@danmoseley
Copy link
Member

@danmoseley danmoseley commented Feb 23, 2026

Adds two new benchmark classes to exercise the LeadingStrings vs FixedDistanceSets heuristic in the regex engine:

  • Perf_Regex_LeadingStrings_BinaryData: 1MB binary corpus (PE-header-like seed duplicated), alternation of binary patterns. Validates no regression on non-text input. (Lots of ASCII here, but obviously not English frequencies.)
  • Perf_Regex_LeadingStrings_NonAscii: ~100KB Russian text (Anna Karenina opening), alternation of Russian words. Validates no regression on non-ASCII text where the frequency heuristic bails out.

Companion to dotnet/runtime change: dotnet/runtime#124736

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds two new benchmark classes to test regex alternation pattern performance with different input types. The benchmarks are designed to validate that the LeadingStrings vs FixedDistanceSets heuristic in the regex engine doesn't regress on binary and non-ASCII data.

Changes:

  • Added Perf_Regex_LeadingStrings_BinaryData class with benchmarks for binary data patterns (1MB PE-header-like corpus)
  • Added Perf_Regex_LeadingStrings_NonAscii class with benchmarks for Russian text patterns (~100KB from Anna Karenina)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant