Skip to content

Add RegexOptions.AnyNewLine via parser lowering#124701

Open
danmoseley wants to merge 17 commits intodotnet:mainfrom
danmoseley:anynewline-lower-v2
Open

Add RegexOptions.AnyNewLine via parser lowering#124701
danmoseley wants to merge 17 commits intodotnet:mainfrom
danmoseley:anynewline-lower-v2

Conversation

@danmoseley
Copy link
Member

@danmoseley danmoseley commented Feb 21, 2026

Motivation

.NET's Regex class hardcodes \n as the only newline character. With RegexOptions.Multiline, $ matches before \n but not before \r, \r\n, or Unicode line breaks. This is "by far one of the biggest gotchas" with System.Text.RegularExpressions:

// BUG: on a file with Windows \r\n line endings, .+$ captures trailing \r
var match = Regex.Match("foo\r\nbar", ".*$", RegexOptions.Multiline);
// match.Value == "foo\r" -- not "foo"!

Users are forced into fragile workarounds like \r?$ or (\r\n|\n) to handle mixed line endings. Real-world NuGet packages show how common this is -- from the real-world regex patterns dataset:

  • (\r\n|\n) (18,474 packages) -- CSV parser manually matching both line endings
  • \r?\n in PEM key parsing (1,964 packages) -- \r?\n sprinkled throughout with Multiline
  • $(\r?\n)? in assembly attribute matching (2,108 packages) -- using Multiline with manual newline handling
  • [\r\n]+ (2,422 packages) -- matching any newline character

These workarounds are error-prone, don't compose well with ^ and $ anchors, and miss Unicode newlines (\u0085, \u2028, \u2029).

Summary

Implements RegexOptions.AnyNewLine (api-approved) which makes $, ^, \Z, and . recognize all Unicode line boundaries: \r, \r\n, \n, \u0085 (NEL), \u2028 (LS), \u2029 (PS) -- consistent with Unicode TR18 RL1.6 and PCRE2's (*ANY) behavior.

With AnyNewLine, the example above just works:

var match = Regex.Match("foo\r\nbar", ".*$", RegexOptions.Multiline | RegexOptions.AnyNewLine);
// match.Value == "foo"

Approach: Parser Lowering

All logic lives in RegexParser.cs -- no changes to the interpreter, compiler, or source generator engines. Each affected construct is lowered into an equivalent RegexNode sub-tree:

Construct Lowered to
$ (no Multiline) / \Z (?=\r\n\z|\r?\z)|(?<!\r)(?=\n\z)|(?=[\u0085\u2028\u2029]\z)
$ (Multiline) (?=\r\n|\r|[\u0085\u2028\u2029]|\z)|(?<!\r)(?=\n)
^ (Multiline) (?<=\A|\r\n|\n|[\u0085\u2028\u2029])|(?<=\r)(?!\n)
. [^\r\n\u0085\u2028\u2029] (but Singleline takes precedence)

Key design choices:

  • \r\n is atomic: $ never matches between \r and \n. This is enforced with lookbehind/lookahead guards.
  • Singleline takes precedence: . with Singleline | AnyNewLine matches everything (including newlines), consistent with Singleline's documented behavior.
  • \A and \z are unaffected: absolute start/end anchors don't change.
  • Incompatible with NonBacktracking and ECMAScript: throws ArgumentOutOfRangeException (lowered patterns use lookaround).
  • Zero perf impact on existing patterns: the lowering is gated on the AnyNewLine flag, so patterns that don't use it take the same code paths as before. The only new cost is a flag check ((_options & RegexOptions.AnyNewLine) != 0) in the parser for $, ^, \Z, and ., which is negligible.

Out of scope: \R

Unicode TR18 RL1.6 also recommends a meta-character \R for matching any newline sequence (consuming the characters), equivalent to (?:\r\n|[\n\v\f\r\u0085\u2028\u2029]). This is distinct from what AnyNewLine does: AnyNewLine modifies the behavior of existing zero-width anchors (^, $, \Z) and the character class ., while \R would be a new consuming pattern element. Adding \R could be done independently as a separate feature.

Changes

Production code

  • RegexOptions.cs -- add AnyNewLine = 0x0800
  • RegexParser.cs -- lowering methods AnyNewLineEndZNode(), AnyNewLineEolNode(), AnyNewLineBolNode(), plus . handling
  • RegexCharClass.cs -- add NotNewLineOrCarriageReturnClass constant
  • Regex.cs / RegexCompilationInfo.cs -- validation

Tests

  • ~120 new test cases covering dot, anchors ($, ^, \Z), RightToLeft, Singleline, Multiline, Replace, Split, Count, EnumerateMatches, NonBacktracking rejection, edge cases (adjacent newlines, empty lines, all-newline strings), and PCRE2-inspired scenarios

Fixes #25598

danmoseley and others added 14 commits February 20, 2026 21:59
Add AnyNewLine = 0x0800 to RegexOptions enum. Update ValidateOptions to
bump MaxOptionShift to 12 and reject AnyNewLine | NonBacktracking.
ECMAScript already rejects unknown options via allowlist.

Update source generator to include AnyNewLine in SupportedOptions mask.
Update tests that used 0x800 as an invalid option value to use 0x1000.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When AnyNewLine is set without Multiline, lower $ from EndZ into an
equivalent sub-tree: (?=\r\n\z|\r?\z)|(?<!\r)(?=\n\z)

This matches at end of string, or before \r\n, \r, or \n at end of
string, but not between \r and \n. Works across all engines
(interpreter, compiled, source generator) since it's pure parser
lowering.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When AnyNewLine is set, lower \Z using the same sub-tree as $ without
Multiline. \Z is not affected by Multiline, so the same lowering
applies regardless.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When both Multiline and AnyNewLine are set, lower $ to:
  (?=\r\n|\r|\z)|(?<!\r)(?=\n)

This matches at \r\n, \r, \n boundaries and end-of-string,
without matching between \r and \n of a \r\n sequence.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When both Multiline and AnyNewLine are set, lower ^ to:
  (?<=\A|\r\n|\n)|(?<=\r)(?!\n)

This matches after \r\n, \n, bare \r (not followed by \n), and
at start of string. Without Multiline, ^ remains \A unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When AnyNewLine is set (without Singleline), lower . to [^\n\r]
instead of [^\n], so dot does not match \r or \n.

Add NotNewLineOrCarriageReturnClass constant to RegexCharClass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Combined ^/$/. tests, Replace/Split, RightToLeft, mixed newlines,
empty lines, \Z with trailing newlines, and edge cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Integration tests using a ~50 char string with all newline types
(\r\n, \r, \n, \u0085, \u2028, \u2029) exercising ^, $, \Z, and .
together. Replace/Split tests with MatchEvaluator line numbering.
Deduplicated cases moved into per-feature tests (RightToLeft,
empty lines).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Expand test coverage across all AnyNewLine-affected constructs:
- Dollar, EndZ, DollarMultiline, CaretMultiline, Dot test data
  with adjacent newlines, newlines at string boundaries,
  empty segments, RightToLeft, and all Unicode newline types
- Advanced tests: inline options, backreferences, conditionals,
  alternation with anchors, lookahead/lookbehind, quantified dot,
  lazy quantifiers, named/atomic groups, word boundaries near
  newlines, explicit char classes unaffected
- Methods test: IsMatch, Count, EnumerateMatches, Match with
  startat, Replace with group ref, Split
- Unicode expansion: \s/\S behavior, \w behavior, \p{Zl}/\p{Zp}
  categories, adjacent Unicode+ASCII newlines, baselines without
  AnyNewLine

No bugs found — all initial test failures were wrong expectations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Verify the fixer correctly emits RegexOptions.Multiline |
RegexOptions.AnyNewLine in enum value order when upgrading
to GeneratedRegex.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Test cases derived from cross-validation with PCRE2 NEWLINE_ANY
behavior (BSD-licensed) and analysis of real-world patterns from
dotnet/runtime-assets:
- (.+)# greedy where .+ cannot cross newlines (PCRE2 JIT 472)
- (.)(.) requiring consecutive non-newlines (PCRE2 JIT 471)
- (.). with mixed newline types (PCRE2 JIT 469)
- Blank line detection (^ +$) with \n, \r\n, \u0085 separators

All 31,528 tests pass. No bugs found — our implementation is
fully consistent with PCRE2 NEWLINE_ANY behavior and handles
real-world patterns correctly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add more RightToLeft + AnyNewLine tests (various newline types, dot,
  anchors, \Z)
- Add more Singleline | AnyNewLine tests (all newline types, combined
  with Multiline)
- Replace RegexOptions.AnyNewLine with RegexHelpers.RegexOptionAnyNewLine
  throughout tests for net481 compilation compatibility
- Wrap Count/EnumerateMatches in #if NET for net481 compat
- Add clarifying comments on Split behavior with/without AnyNewLine

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danmoseley
Copy link
Member Author

(Finally got around to having AI finish my lowering branch..)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements RegexOptions.AnyNewLine (value 0x0800 = 2048), a new regex option that makes ^, $, \Z, and . recognize all Unicode line boundaries (\r, \r\n, \n, \u0085 NEL, \u2028 LS, \u2029 PS) instead of only \n. This addresses a major usability issue where users had to manually work around .NET's hardcoded \n-only line ending behavior.

Changes:

  • Added RegexOptions.AnyNewLine = 0x0800 enum value with incompatibility checks for NonBacktracking and ECMAScript modes
  • Implemented parser-level lowering of ^, $, \Z, and . into equivalent lookaround-based RegexNode trees when AnyNewLine is enabled
  • Added comprehensive test coverage (~800 new test lines) covering all anchor types, newline combinations, RightToLeft mode, inline options, and edge cases

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexOptions.cs Added AnyNewLine = 0x0800 enum value with XML documentation
src/libraries/System.Text.RegularExpressions/ref/System.Text.RegularExpressions.cs Updated ref assembly with AnyNewLine = 2048
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.cs Updated MaxOptionShift to 12 and added AnyNewLine to NonBacktracking incompatibility check
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs Implemented lowering methods (AnyNewLineEndZNode, AnyNewLineEolNode, AnyNewLineBolNode) and integrated into ^, $, \Z, . parsing
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs Added NotNewLineOrCarriageReturnClass constant for . with AnyNewLine
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs Added AnyNewLine to source generator's supported options
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs Added ~800 lines of comprehensive tests for all anchor types, newline combinations, and edge cases
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Tests.Common.cs Added RegexOptionAnyNewLine constant for test compatibility
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Ctor.Tests.cs Updated invalid option test from 0x800 to 0x1000; added NonBacktracking+AnyNewLine incompatibility test
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs Updated invalid option comments and tests from 0x800 to 0x1000
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.EnumerateMatches.Tests.cs Updated invalid option tests from 0x800 to 0x1000
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorParserTests.cs Updated invalid option tests from 0x800 to 0x1000
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/UpgradeToGeneratedRegexAnalyzerTests.cs Updated tests for 0x1000 as invalid option; added AnyNewLine test case for code fixer

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.

@danmoseley
Copy link
Member Author

@MihuBot benchmark Regex

@MihuBot
Copy link

MihuBot commented Feb 21, 2026

@danmoseley
Copy link
Member Author

Mihubot confirms zero perf impact on existing patterns/options,

@danmoseley
Copy link
Member Author

AnyNewLine Performance Analysis (Release, Compiled, .NET 11.0, BenchmarkDotNet)

Measured impact of converting existing newline-workaround patterns to simplified AnyNewLine equivalents. All scenarios use RegexOptions.Compiled (representative of source-generated too). Measured with BenchmarkDotNet (InProcess, ShortRun). All match counts verified identical between old and new patterns.

Section 1: Real-World Patterns on Windows \r\n Text

Old Pattern New Pattern (+ AnyNewLine) Old (us) New (us) Ratio
^.+\r?$ (1K lines) ^.+$ 46.7 48.8 1.05x
^.+\r?$ (10K lines) ^.+$ 1,694 1,760 1.04x
\[assembly:...\]\s*$(\r?\n)? \[assembly:...\]\s*$ 38.3 32.4 0.85x
^([^\s:]+):\s*(.+?)\r?$ ^([^\s:]+):\s*(.+?)$ 105.9 105.9 1.00x
^# .+\r?$ ^# .+$ 11.1 9.1 0.83x
^.+\r?$ (CSV, 1K rows) ^.+$ 44.4 49.2 1.11x
[^\r\n]+ .+ 44.2 43.8 0.99x
\w+\r?$ \w+$ 90.8 128.7 1.42x
(?:^&#124;\r\n)\w+ ^\w+ 208.7 214.5 1.03x

Section 2: Unix \n Text (overhead of just enabling the flag)

Old Pattern New Pattern (+ AnyNewLine) Old (us) New (us) Ratio
^.+$ ^.+$ 43.5 48.9 1.12x
[^\n]+ .+ 39.0 44.9 1.15x

Section 3: Mixed \n/\r\n Text

Old Pattern New Pattern (+ AnyNewLine) Old (us) New (us) Ratio
[^\r\n\u0085\u2028\u2029]+ .+ 45.4 44.2 0.97x
^.+\r?$ (1K lines) ^.+$ 44.1 50.1 1.14x

Section 4: Non-anchor/dot Patterns (zero impact expected)

Old Pattern New Pattern (+ AnyNewLine) Old (us) New (us) Ratio
\r\n&#124;\r&#124;\n \r\n&#124;\r&#124;\n 20.0 21.7 1.08x
\w+ \w+ 322.4 336.4 1.04x

Section 5: Pathological Cases (unlikely in practice)

Old Pattern New Pattern (+ AnyNewLine) Old (us) New (us) Ratio
$ $ 98.2 134.1 1.37x
^ ^ 145.6 131.6 0.90x
\w+\r?\Z (329K chars) \w+\Z 494.2 1,039.3 2.10x

Summary

  1. Real-world patterns in Compiled mode show 0.83x--1.14x -- essentially zero cost, and sometimes faster because the AnyNewLine pattern is simpler (e.g., ^# .+$ vs ^# .+\r?$ -- removing the \r? node saves more than the lowered $ costs).

  2. Where small regressions occur (1.1x--1.4x), the cause is the lowered anchor tree: a native $ (Eol) is a single "is next char \n?" check, but AnyNewLine lowers it to a lookahead alternation like (?=\r\n|\r|\n|\u0085|\u2028|\u2029|\z). Even when the input only contains \r\n, the engine must evaluate the alternation branches. This overhead is proportionally more visible when the anchor dominates the work (e.g., \w+$ where the \w+ match is short), and nearly invisible when .+ dominates each line's work (e.g., ^.+$ at 1.04x).

  3. Patterns without anchors or dot are completely unaffected (1.04--1.08x, within noise) -- the flag only changes behavior of ., ^, $, \Z.

  4. Only pathological case: \w+\Z on very large input (329K chars) at 2.1x -- the lowered \Z alternation tree is evaluated during backtracking at many positions. Unlikely in practice.

  5. In Compiled/source-generated mode, the JIT compiles the lowered alternation branches into efficient single-char comparisons, keeping overhead minimal. Interpreted mode shows larger gaps (2--3x for typical patterns) but AnyNewLine + interpreted + perf-sensitive is an unlikely combination.

Benchmark source code (BenchmarkDotNet)
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Reports;
using BenchmarkDotNet.Toolchains.InProcess.Emit;

BenchmarkRunner.Run<AnyNewLineBenchmarks>(
    DefaultConfig.Instance
        .WithSummaryStyle(SummaryStyle.Default.WithRatioStyle(RatioStyle.Percentage))
        .AddJob(Job.ShortRun.WithToolchain(InProcessEmitToolchain.Instance)));

[MemoryDiagnoser(false)]
[HideColumns("Job", "Error", "StdDev", "RatioSD", "Alloc Ratio")]
public class AnyNewLineBenchmarks
{
    private const RegexOptions AnyNewLine = (RegexOptions)0x0800;

    private static string GenerateText(int lineCount, string[] newlines)
    {
        var sb = new StringBuilder();
        for (int i = 0; i < lineCount; i++)
        {
            sb.Append("Lorem ipsum dolor sit amet ");
            sb.Append(i);
            sb.Append(newlines[i % newlines.Length]);
        }
        return sb.ToString();
    }

    private static readonly string WinText1K = GenerateText(1000, ["\r\n"]);
    private static readonly string WinText10K = GenerateText(10000, ["\r\n"]);
    private static readonly string UnixText1K = GenerateText(1000, ["\n"]);
    private static readonly string MixedNR1K = GenerateText(1000, ["\n", "\r\n"]);
    private static readonly string MixedAll1K = GenerateText(1000,
        ["\n", "\r\n", "\r", "\u0085", "\u2028", "\u2029"]);

    private static readonly string AssemblyInfo;
    private static readonly string KvConfig;
    private static readonly string Markdown;
    private static readonly string CsvData;

    static AnyNewLineBenchmarks()
    {
        var sb = new StringBuilder();
        string[] attrs = {
            "[assembly: AssemblyTitle(\"MyApp\")]",
            "[assembly: AssemblyDescription(\"A sample app\")]",
            "[assembly: AssemblyConfiguration(\"\")]",
            "[assembly: AssemblyCompany(\"Contoso\")]",
            "[assembly: AssemblyProduct(\"MyApp\")]",
            "[assembly: AssemblyCopyright(\"Copyright 2024\")]",
            "[assembly: AssemblyTrademark(\"\")]",
            "[assembly: AssemblyCulture(\"\")]",
            "[assembly: AssemblyVersion(\"1.0.0.0\")]",
            "[assembly: AssemblyFileVersion(\"1.0.0.0\")]"
        };
        foreach (var attr in attrs) { sb.Append(attr); sb.Append("\r\n"); }
        AssemblyInfo = string.Concat(Enumerable.Repeat(sb.ToString(), 50));

        sb.Clear();
        string[] keys = { "Server", "Database", "User", "Password", "Timeout",
                          "MaxPool", "MinPool", "Encrypt", "TrustCert", "AppName" };
        for (int i = 0; i < 50; i++)
        {
            sb.Append(keys[i % keys.Length]); sb.Append(": value_"); sb.Append(i); sb.Append("\r\n");
        }
        KvConfig = string.Concat(Enumerable.Repeat(sb.ToString(), 20));

        sb.Clear();
        for (int i = 0; i < 200; i++)
        {
            sb.Append($"# Heading {i}\r\n");
            sb.Append($"Some paragraph text about topic {i}.\r\n");
            sb.Append($"Another line of content here.\r\n\r\n");
        }
        Markdown = sb.ToString();

        sb.Clear();
        sb.Append("Name,Age,City,Email\r\n");
        for (int i = 0; i < 1000; i++)
            sb.Append($"User{i},{20 + i % 50},City{i % 100},user{i}@example.com\r\n");
        CsvData = sb.ToString();
    }

    // Section 1: Real-world on Windows \r\n text
    private static readonly Regex Old_1a = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_1a = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Baseline = true, Description = "1a_Lines1K_Old")]
    public int Lines1K_Old() => Old_1a.Matches(WinText1K).Count;
    [Benchmark(Description = "1a_Lines1K_New")]
    public int Lines1K_New() => New_1a.Matches(WinText1K).Count;

    private static readonly Regex Old_1b = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_1b = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "1b_Lines10K_Old")]
    public int Lines10K_Old() => Old_1b.Matches(WinText10K).Count;
    [Benchmark(Description = "1b_Lines10K_New")]
    public int Lines10K_New() => New_1b.Matches(WinText10K).Count;

    private static readonly Regex Old_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$(\r?\n)?",
        RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$",
        RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "2_Assembly_Old")]
    public int Assembly_Old() => Old_2.Matches(AssemblyInfo).Count;
    [Benchmark(Description = "2_Assembly_New")]
    public int Assembly_New() => New_2.Matches(AssemblyInfo).Count;

    private static readonly Regex Old_3 = new(@"^([^\s:]+):\s*(.+?)\r?$",
        RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_3 = new(@"^([^\s:]+):\s*(.+?)$",
        RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "3_KeyVal_Old")]
    public int KeyVal_Old() => Old_3.Matches(KvConfig).Count;
    [Benchmark(Description = "3_KeyVal_New")]
    public int KeyVal_New() => New_3.Matches(KvConfig).Count;

    private static readonly Regex Old_4 = new(@"^# .+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_4 = new(@"^# .+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "4_Markdown_Old")]
    public int Markdown_Old() => Old_4.Matches(Markdown).Count;
    [Benchmark(Description = "4_Markdown_New")]
    public int Markdown_New() => New_4.Matches(Markdown).Count;

    private static readonly Regex Old_5 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_5 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "5_CSV_Old")]
    public int CSV_Old() => Old_5.Matches(CsvData).Count;
    [Benchmark(Description = "5_CSV_New")]
    public int CSV_New() => New_5.Matches(CsvData).Count;

    private static readonly Regex Old_6 = new(@"[^\r\n]+", RegexOptions.Compiled);
    private static readonly Regex New_6 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "6_DotExcl_Old")]
    public int DotExcl_Old() => Old_6.Matches(WinText1K).Count;
    [Benchmark(Description = "6_DotExcl_New")]
    public int DotExcl_New() => New_6.Matches(WinText1K).Count;

    private static readonly Regex Old_7 = new(@"\w+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_7 = new(@"\w+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "7_WordEOL_Old")]
    public int WordEOL_Old() => Old_7.Matches(WinText1K).Count;
    [Benchmark(Description = "7_WordEOL_New")]
    public int WordEOL_New() => New_7.Matches(WinText1K).Count;

    private static readonly Regex Old_8 = new(@"(?:^|\r\n)\w+", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_8 = new(@"^\w+", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "8_LineSt_Old")]
    public int LineStart_Old() => Old_8.Matches(WinText1K).Count;
    [Benchmark(Description = "8_LineSt_New")]
    public int LineStart_New() => New_8.Matches(WinText1K).Count;

    // Section 2: Unix \n text (control)
    private static readonly Regex Old_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "9_UnixLines_Old")]
    public int UnixLines_Old() => Old_9.Matches(UnixText1K).Count;
    [Benchmark(Description = "9_UnixLines_New")]
    public int UnixLines_New() => New_9.Matches(UnixText1K).Count;

    private static readonly Regex Old_10 = new(@"[^\n]+", RegexOptions.Compiled);
    private static readonly Regex New_10 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "10_UnixDot_Old")]
    public int UnixDot_Old() => Old_10.Matches(UnixText1K).Count;
    [Benchmark(Description = "10_UnixDot_New")]
    public int UnixDot_New() => New_10.Matches(UnixText1K).Count;

    // Section 3: Mixed newline text
    private static readonly Regex Old_11 = new(@"[^\r\n\u0085\u2028\u2029]+", RegexOptions.Compiled);
    private static readonly Regex New_11 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "11_MixedDot_Old")]
    public int MixedDot_Old() => Old_11.Matches(MixedAll1K).Count;
    [Benchmark(Description = "11_MixedDot_New")]
    public int MixedDot_New() => New_11.Matches(MixedAll1K).Count;

    private static readonly Regex Old_12 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_12 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "12_MixedLines_Old")]
    public int MixedLines_Old() => Old_12.Matches(MixedNR1K).Count;
    [Benchmark(Description = "12_MixedLines_New")]
    public int MixedLines_New() => New_12.Matches(MixedNR1K).Count;

    // Section 4: Non-anchor patterns (zero impact)
    private static readonly Regex Old_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled);
    private static readonly Regex New_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "14_Literal_Old")]
    public int Literal_Old() => Old_14.Matches(MixedAll1K).Count;
    [Benchmark(Description = "14_Literal_New")]
    public int Literal_New() => New_14.Matches(MixedAll1K).Count;

    private static readonly Regex Old_15 = new(@"\w+", RegexOptions.Compiled);
    private static readonly Regex New_15 = new(@"\w+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "15_Words_Old")]
    public int Words_Old() => Old_15.Matches(WinText1K).Count;
    [Benchmark(Description = "15_Words_New")]
    public int Words_New() => New_15.Matches(WinText1K).Count;

    // Section 5: Pathological
    private static readonly Regex Old_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "P1_BareEOL_Old")]
    public int BareEOL_Old() => Old_P1.Matches(WinText1K).Count;
    [Benchmark(Description = "P1_BareEOL_New")]
    public int BareEOL_New() => New_P1.Matches(WinText1K).Count;

    private static readonly Regex Old_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "P2_BareBOL_Old")]
    public int BareBOL_Old() => Old_P2.Matches(WinText1K).Count;
    [Benchmark(Description = "P2_BareBOL_New")]
    public int BareBOL_New() => New_P2.Matches(WinText1K).Count;

    private static readonly Regex Old_P3 = new(@"\w+\r?\Z", RegexOptions.Compiled);
    private static readonly Regex New_P3 = new(@"\w+\Z", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "P3_EndZ_Old")]
    public bool EndZ_Old() => Old_P3.IsMatch(WinText10K);
    [Benchmark(Description = "P3_EndZ_New")]
    public bool EndZ_New() => New_P3.IsMatch(WinText10K);
}

Simplify the lowered trees for $, ^, and \Z anchors:
- Eol ($): Merge \r\n|\r|[\u0085\u2028\u2029] into [\r\u0085\u2028\u2029] (4 branches -> 2)
  \r covers both \r\n and bare \r since lookahead only checks first char
- Bol (^): Merge \r\n|\n|[\u0085\u2028\u2029] into [\n\u0085\u2028\u2029] (4 branches -> 2)
  \n covers both \r\n and bare \n since lookbehind only checks last char
- EndZ (\Z): Merge \r? and [\u0085\u2028\u2029] into [\r\u0085\u2028\u2029]? (3 branches -> 2)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danmoseley
Copy link
Member Author

Thinking about how to optimize '.' more here:

  • . without AnyNewLine → RegexNodeKind.Notone '\n' → compiled to a single ch != '\n' comparison
  • . with AnyNewLine → [^\r\n\u0085\u2028\u2029] → RegexNodeKind.Set → compiled to a bitmap/bitmask operation (branchless but ~3-5 IL ops including subtract + shift + mask)

The compiled JIT for Notone is literally one compare-and-branch. The character class, even with a bitmap optimization, has to do range subtraction + bit shifting. Per character, that's maybe
2-3ns extra, and over 30K+ characters per line-scan, it adds up to the ~10-18% overhead we see.

So the overhead is not from "more chars to match against" — it's from crossing the boundary between the fast-path Notone node type (single char compare) and the generic Set node type
(bitmap). Even a 2-char negated class like [^\r\n] would have the same overhead shape, since it goes through Set rather than Notone.

In principle, the engine could be taught to recognize small negated character classes and emit a chain of != comparisons instead of bitmap ops, but that would be an engine optimization beyond the scope of this PR.

@danmoseley
Copy link
Member Author

Updated AnyNewLine Performance Analysis (with anchor optimization)

After the initial analysis above, I optimized the lowered anchor trees by merging redundant alternation branches:

  • $ (Eol): (?=\r\n|\r|[\u0085\u2028\u2029]|\z)(?=[\r\u0085\u2028\u2029]|\z)\r covers both \r\n and bare \r (lookahead only checks first char), so the 4-branch alternation collapses to 2.
  • ^ (Bol): (?<=\A|\r\n|\n|[\u0085\u2028\u2029])(?<=[\n\u0085\u2028\u2029]|\A)\n covers both \r\n and bare \n (lookbehind only checks last char), 4 branches → 2.
  • \Z (EndZ): Merged \r? and [\u0085\u2028\u2029] into [\r\u0085\u2028\u2029]?, 3 outer branches → 2.

Re-measured with BenchmarkDotNet MediumRun (15 target iterations, 3 warmup — more stable than previous ShortRun). All scenarios use RegexOptions.Compiled, Release build, .NET 11.0. All match counts verified identical.

Section 1: Real-World Patterns on Windows \r\n Text

Old Pattern New Pattern (+ AnyNewLine) Old (µs) New (µs) Ratio
^.+\r?$ (1K lines) ^.+$ 44.6 49.9 1.12x
^.+\r?$ (10K lines) ^.+$ 1,749 1,797 1.03x
\[assembly:...\]\s*$(\r?\n)? \[assembly:...\]\s*$ 37.1 33.1 0.89x
^([^\s:]+):\s*(.+?)\r?$ ^([^\s:]+):\s*(.+?)$ 105.8 100.0 0.95x
^# .+\r?$ ^# .+$ 11.0 9.2 0.83x
^.+\r?$ (CSV, 1K rows) ^.+$ 44.4 47.5 1.07x
[^\r\n]+ .+ 41.9 43.4 1.04x
\w+\r?$ \w+$ 85.1 112.9 1.33x
(?:^&#124;\r\n)\w+ ^\w+ 195.7 193.4 0.99x

Section 2: Unix \n Text (overhead of just enabling the flag)

Old Pattern New Pattern (+ AnyNewLine) Old (µs) New (µs) Ratio
^.+$ ^.+$ 42.3 46.6 1.10x
[^\n]+ .+ 36.3 42.9 1.18x

Section 3: Mixed \n/\r\n Text

Old Pattern New Pattern (+ AnyNewLine) Old (µs) New (µs) Ratio
[^\r\n\u0085\u2028\u2029]+ .+ 43.5 45.6 1.05x
^.+\r?$ (1K lines) ^.+$ 46.0 50.5 1.10x

Section 4: Non-anchor/dot Patterns (zero impact expected)

Old Pattern New Pattern (+ AnyNewLine) Old (µs) New (µs) Ratio
\r\n&#124;\r&#124;\n \r\n&#124;\r&#124;\n 20.1 19.6 0.98x
\w+ \w+ 286.5 285.9 1.00x

Section 5: Pathological Cases (unlikely in practice)

Old Pattern New Pattern (+ AnyNewLine) Old (µs) New (µs) Ratio
$ (1K bare evals) $ 98.6 120.3 1.22x
^ (1K bare evals) ^ 133.3 108.3 0.81x
\w+\r?\Z (329K chars) \w+\Z 480.3 931.5 1.94x

Summary

  1. Real-world patterns in Compiled mode show 0.83x–1.12x — essentially zero to modest cost, and sometimes faster because the AnyNewLine pattern is simpler (e.g., ^# .+$ vs ^# .+\r?$ — removing the \r? node saves more than the lowered $ costs).

  2. Where regressions occur (1.1x–1.3x), the cause is the lowered anchor/dot trees: a native $ (Eol) is a single "is next char \n?" check, but AnyNewLine lowers it to a lookahead with a character class [\r\u0085\u2028\u2029] plus \z. Even when the input only contains \r\n, the engine must evaluate the character class. Similarly, . becomes [^\r\n\u0085\u2028\u2029] which crosses from the fast-path Notone node (single ch != '\n' comparison) to a Set node (bitmap/bitmask operation — branchless but ~3-5 IL ops per character). This overhead is proportionally more visible when the anchor/dot dominates the work (e.g., \w+$ where the \w+ match is short, or [^\n]+.+ on Unix text), and nearly invisible when the overall pattern has other dominant work.

  3. The anchor optimization improved the worst real-world case (\w+$) from 1.42x to 1.33x, and made bare ^ actually faster (0.81x) by using a 2-branch merged character class instead of 4 separate alternation branches.

  4. Patterns without anchors or dot are completely unaffected (0.98x–1.00x) — the flag only changes behavior of ., ^, $, \Z.

  5. Only pathological case: \w+\Z on very large input (329K chars) at 1.94x — the lowered \Z alternation tree is evaluated during backtracking at many positions. Unlikely in practice.

  6. In Compiled/source-generated mode, the JIT compiles the lowered alternation branches into efficient single-char comparisons, keeping overhead minimal. Interpreted mode would show larger gaps but AnyNewLine + interpreted + perf-sensitive is an unlikely combination.

Benchmark source code (BenchmarkDotNet, MediumRun)
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Reports;
using BenchmarkDotNet.Toolchains.InProcess.Emit;

BenchmarkRunner.Run<AnyNewLineBenchmarks>(
    DefaultConfig.Instance
        .WithSummaryStyle(SummaryStyle.Default.WithRatioStyle(RatioStyle.Percentage))
        .AddJob(Job.MediumRun.WithToolchain(InProcessEmitToolchain.Instance)));

[MemoryDiagnoser(false)]
[HideColumns("Job", "Error", "StdDev", "RatioSD", "Alloc Ratio")]
public class AnyNewLineBenchmarks
{
    private const RegexOptions AnyNewLine = (RegexOptions)0x0800;

    private static string GenerateText(int lineCount, string[] newlines)
    {
        var sb = new StringBuilder();
        for (int i = 0; i < lineCount; i++)
        {
            sb.Append("Lorem ipsum dolor sit amet ");
            sb.Append(i);
            sb.Append(newlines[i % newlines.Length]);
        }
        return sb.ToString();
    }

    private static readonly string WinText1K = GenerateText(1000, ["\r\n"]);
    private static readonly string WinText10K = GenerateText(10000, ["\r\n"]);
    private static readonly string UnixText1K = GenerateText(1000, ["\n"]);
    private static readonly string MixedNR1K = GenerateText(1000, ["\n", "\r\n"]);
    private static readonly string MixedAll1K = GenerateText(1000,
        ["\n", "\r\n", "\r", "\u0085", "\u2028", "\u2029"]);

    private static readonly string AssemblyInfo;
    private static readonly string KvConfig;
    private static readonly string Markdown;
    private static readonly string CsvData;

    static AnyNewLineBenchmarks()
    {
        var sb = new StringBuilder();
        string[] attrs = {
            "[assembly: AssemblyTitle(\"MyApp\")]",
            "[assembly: AssemblyDescription(\"A sample app\")]",
            "[assembly: AssemblyConfiguration(\"\")]",
            "[assembly: AssemblyCompany(\"Contoso\")]",
            "[assembly: AssemblyProduct(\"MyApp\")]",
            "[assembly: AssemblyCopyright(\"Copyright 2024\")]",
            "[assembly: AssemblyTrademark(\"\")]",
            "[assembly: AssemblyCulture(\"\")]",
            "[assembly: AssemblyVersion(\"1.0.0.0\")]",
            "[assembly: AssemblyFileVersion(\"1.0.0.0\")]"
        };
        foreach (var attr in attrs) { sb.Append(attr); sb.Append("\r\n"); }
        AssemblyInfo = string.Concat(Enumerable.Repeat(sb.ToString(), 50));

        sb.Clear();
        string[] keys = { "Server", "Database", "User", "Password", "Timeout",
                          "MaxPool", "MinPool", "Encrypt", "TrustCert", "AppName" };
        for (int i = 0; i < 50; i++)
        {
            sb.Append(keys[i % keys.Length]); sb.Append(": value_"); sb.Append(i); sb.Append("\r\n");
        }
        KvConfig = string.Concat(Enumerable.Repeat(sb.ToString(), 20));

        sb.Clear();
        for (int i = 0; i < 200; i++)
        {
            sb.Append($"# Heading {i}\r\n");
            sb.Append($"Some paragraph text about topic {i}.\r\n");
            sb.Append($"Another line of content here.\r\n\r\n");
        }
        Markdown = sb.ToString();

        sb.Clear();
        sb.Append("Name,Age,City,Email\r\n");
        for (int i = 0; i < 1000; i++)
            sb.Append($"User{i},{20 + i % 50},City{i % 100},user{i}@example.com\r\n");
        CsvData = sb.ToString();
    }

    // Section 1: Real-world on Windows \r\n text
    private static readonly Regex Old_1a = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_1a = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Baseline = true, Description = "1a_Lines1K_Old")]
    public int Lines1K_Old() => Old_1a.Matches(WinText1K).Count;
    [Benchmark(Description = "1a_Lines1K_New")]
    public int Lines1K_New() => New_1a.Matches(WinText1K).Count;

    private static readonly Regex Old_1b = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_1b = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "1b_Lines10K_Old")]
    public int Lines10K_Old() => Old_1b.Matches(WinText10K).Count;
    [Benchmark(Description = "1b_Lines10K_New")]
    public int Lines10K_New() => New_1b.Matches(WinText10K).Count;

    private static readonly Regex Old_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$(\r?\n)?",
        RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$",
        RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "2_Assembly_Old")]
    public int Assembly_Old() => Old_2.Matches(AssemblyInfo).Count;
    [Benchmark(Description = "2_Assembly_New")]
    public int Assembly_New() => New_2.Matches(AssemblyInfo).Count;

    private static readonly Regex Old_3 = new(@"^([^\s:]+):\s*(.+?)\r?$",
        RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_3 = new(@"^([^\s:]+):\s*(.+?)$",
        RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "3_KeyVal_Old")]
    public int KeyVal_Old() => Old_3.Matches(KvConfig).Count;
    [Benchmark(Description = "3_KeyVal_New")]
    public int KeyVal_New() => New_3.Matches(KvConfig).Count;

    private static readonly Regex Old_4 = new(@"^# .+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_4 = new(@"^# .+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "4_Markdown_Old")]
    public int Markdown_Old() => Old_4.Matches(Markdown).Count;
    [Benchmark(Description = "4_Markdown_New")]
    public int Markdown_New() => New_4.Matches(Markdown).Count;

    private static readonly Regex Old_5 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_5 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "5_CSV_Old")]
    public int CSV_Old() => Old_5.Matches(CsvData).Count;
    [Benchmark(Description = "5_CSV_New")]
    public int CSV_New() => New_5.Matches(CsvData).Count;

    private static readonly Regex Old_6 = new(@"[^\r\n]+", RegexOptions.Compiled);
    private static readonly Regex New_6 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "6_DotExcl_Old")]
    public int DotExcl_Old() => Old_6.Matches(WinText1K).Count;
    [Benchmark(Description = "6_DotExcl_New")]
    public int DotExcl_New() => New_6.Matches(WinText1K).Count;

    private static readonly Regex Old_7 = new(@"\w+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_7 = new(@"\w+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "7_WordEOL_Old")]
    public int WordEOL_Old() => Old_7.Matches(WinText1K).Count;
    [Benchmark(Description = "7_WordEOL_New")]
    public int WordEOL_New() => New_7.Matches(WinText1K).Count;

    private static readonly Regex Old_8 = new(@"(?:^|\r\n)\w+", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_8 = new(@"^\w+", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "8_LineSt_Old")]
    public int LineStart_Old() => Old_8.Matches(WinText1K).Count;
    [Benchmark(Description = "8_LineSt_New")]
    public int LineStart_New() => New_8.Matches(WinText1K).Count;

    // Section 2: Unix \n text (control)
    private static readonly Regex Old_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "9_UnixLines_Old")]
    public int UnixLines_Old() => Old_9.Matches(UnixText1K).Count;
    [Benchmark(Description = "9_UnixLines_New")]
    public int UnixLines_New() => New_9.Matches(UnixText1K).Count;

    private static readonly Regex Old_10 = new(@"[^\n]+", RegexOptions.Compiled);
    private static readonly Regex New_10 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "10_UnixDot_Old")]
    public int UnixDot_Old() => Old_10.Matches(UnixText1K).Count;
    [Benchmark(Description = "10_UnixDot_New")]
    public int UnixDot_New() => New_10.Matches(UnixText1K).Count;

    // Section 3: Mixed newline text
    private static readonly Regex Old_11 = new(@"[^\r\n\u0085\u2028\u2029]+", RegexOptions.Compiled);
    private static readonly Regex New_11 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "11_MixedDot_Old")]
    public int MixedDot_Old() => Old_11.Matches(MixedAll1K).Count;
    [Benchmark(Description = "11_MixedDot_New")]
    public int MixedDot_New() => New_11.Matches(MixedAll1K).Count;

    private static readonly Regex Old_12 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_12 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "12_MixedLines_Old")]
    public int MixedLines_Old() => Old_12.Matches(MixedNR1K).Count;
    [Benchmark(Description = "12_MixedLines_New")]
    public int MixedLines_New() => New_12.Matches(MixedNR1K).Count;

    // Section 4: Non-anchor patterns (zero impact)
    private static readonly Regex Old_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled);
    private static readonly Regex New_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "14_Literal_Old")]
    public int Literal_Old() => Old_14.Matches(MixedAll1K).Count;
    [Benchmark(Description = "14_Literal_New")]
    public int Literal_New() => New_14.Matches(MixedAll1K).Count;

    private static readonly Regex Old_15 = new(@"\w+", RegexOptions.Compiled);
    private static readonly Regex New_15 = new(@"\w+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "15_Words_Old")]
    public int Words_Old() => Old_15.Matches(WinText1K).Count;
    [Benchmark(Description = "15_Words_New")]
    public int Words_New() => New_15.Matches(WinText1K).Count;

    // Section 5: Pathological
    private static readonly Regex Old_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "P1_BareEOL_Old")]
    public int BareEOL_Old() => Old_P1.Matches(WinText1K).Count;
    [Benchmark(Description = "P1_BareEOL_New")]
    public int BareEOL_New() => New_P1.Matches(WinText1K).Count;

    private static readonly Regex Old_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "P2_BareBOL_Old")]
    public int BareBOL_Old() => Old_P2.Matches(WinText1K).Count;
    [Benchmark(Description = "P2_BareBOL_New")]
    public int BareBOL_New() => New_P2.Matches(WinText1K).Count;

    private static readonly Regex Old_P3 = new(@"\w+\r?\Z", RegexOptions.Compiled);
    private static readonly Regex New_P3 = new(@"\w+\Z", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "P3_EndZ_Old")]
    public bool EndZ_Old() => Old_P3.IsMatch(WinText10K);
    [Benchmark(Description = "P3_EndZ_New")]
    public bool EndZ_New() => New_P3.IsMatch(WinText10K);
}

@danmoseley
Copy link
Member Author

Some real test failures...

AnyNewLine (0x800) is not a valid RegexOptions value on .NET Framework,
so the Regex constructor throws ArgumentOutOfRangeException. Add
[SkipOnTargetFramework] to all 10 AnyNewLine test methods.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 25, 2026 02:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regex Match, Split and Matches should support RegexOptions.AnyNewLine as (?=\r\z|\n\z|\r\n\z|\z)

3 participants