Add RegexOptions.AnyNewLine via parser lowering by danmoseley · Pull Request #124701 · dotnet/runtime

danmoseley · 2026-02-21T08:27:44Z

Motivation

.NET's Regex class hardcodes \n as the only newline character. With RegexOptions.Multiline, $ matches before \n but not before \r, \r\n, or Unicode line breaks. This is "by far one of the biggest gotchas" with System.Text.RegularExpressions:

// BUG: on a file with Windows \r\n line endings, .+$ captures trailing \r
var match = Regex.Match("foo\r\nbar", ".*$", RegexOptions.Multiline);
// match.Value == "foo\r" -- not "foo"!

Users are forced into fragile workarounds like \r?$ or (\r\n|\n) to handle mixed line endings. Real-world NuGet packages show how common this is -- from the real-world regex patterns dataset:

(\r\n|\n) (18,474 packages) -- CSV parser manually matching both line endings
\r?\n in PEM key parsing (1,964 packages) -- \r?\n sprinkled throughout with Multiline
$(\r?\n)? in assembly attribute matching (2,108 packages) -- using Multiline with manual newline handling
[\r\n]+ (2,422 packages) -- matching any newline character

These workarounds are error-prone, don't compose well with ^ and $ anchors, and miss Unicode newlines (\u0085, \u2028, \u2029).

Summary

Implements RegexOptions.AnyNewLine (api-approved) which makes $, ^, \Z, and . recognize all Unicode line boundaries: \r, \r\n, \n, \u0085 (NEL), \u2028 (LS), \u2029 (PS) -- consistent with Unicode TR18 RL1.6 and PCRE2's (*ANY) behavior.

With AnyNewLine, the example above just works:

var match = Regex.Match("foo\r\nbar", ".*$", RegexOptions.Multiline | RegexOptions.AnyNewLine);
// match.Value == "foo"

Approach: Parser Lowering

All logic lives in RegexParser.cs -- no changes to the interpreter, compiler, or source generator engines. Each affected construct is lowered into an equivalent RegexNode sub-tree:

Construct	Lowered to
`$` (no Multiline) / `\Z`	`(?=\r\n\z\|\r?\z)\|(?<!\r)(?=\n\z)\|(?=[\u0085\u2028\u2029]\z)`
`$` (Multiline)	`(?=\r\n\|\r\|[\u0085\u2028\u2029]\|\z)\|(?<!\r)(?=\n)`
`^` (Multiline)	`(?<=\A\|\r\n\|\n\|[\u0085\u2028\u2029])\|(?<=\r)(?!\n)`
`.`	`[^\r\n\u0085\u2028\u2029]` (but `Singleline` takes precedence)

Key design choices:

\r\n is atomic: $ never matches between \r and \n. This is enforced with lookbehind/lookahead guards.
Singleline takes precedence: . with Singleline | AnyNewLine matches everything (including newlines), consistent with Singleline's documented behavior.
\A and \z are unaffected: absolute start/end anchors don't change.
Incompatible with NonBacktracking and ECMAScript: throws ArgumentOutOfRangeException (lowered patterns use lookaround).
Zero perf impact on existing patterns: the lowering is gated on the AnyNewLine flag, so patterns that don't use it take the same code paths as before. The only new cost is a flag check ((_options & RegexOptions.AnyNewLine) != 0) in the parser for $, ^, \Z, and ., which is negligible.

Out of scope: `\R`

Unicode TR18 RL1.6 also recommends a meta-character \R for matching any newline sequence (consuming the characters), equivalent to (?:\r\n|[\n\v\f\r\u0085\u2028\u2029]). This is distinct from what AnyNewLine does: AnyNewLine modifies the behavior of existing zero-width anchors (^, $, \Z) and the character class ., while \R would be a new consuming pattern element. Adding \R could be done independently as a separate feature.

Changes

Production code

RegexOptions.cs -- add AnyNewLine = 0x0800
RegexParser.cs -- lowering methods AnyNewLineEndZNode(), AnyNewLineEolNode(), AnyNewLineBolNode(), plus . handling
RegexCharClass.cs -- add NotNewLineOrCarriageReturnClass constant
Regex.cs / RegexCompilationInfo.cs -- validation

Tests

~120 new test cases covering dot, anchors ($, ^, \Z), RightToLeft, Singleline, Multiline, Replace, Split, Count, EnumerateMatches, NonBacktracking rejection, edge cases (adjacent newlines, empty lines, all-newline strings), and PCRE2-inspired scenarios

Fixes #25598

Add AnyNewLine = 0x0800 to RegexOptions enum. Update ValidateOptions to bump MaxOptionShift to 12 and reject AnyNewLine | NonBacktracking. ECMAScript already rejects unknown options via allowlist. Update source generator to include AnyNewLine in SupportedOptions mask. Update tests that used 0x800 as an invalid option value to use 0x1000. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When AnyNewLine is set without Multiline, lower $ from EndZ into an equivalent sub-tree: (?=\r\n\z|\r?\z)|(?<!\r)(?=\n\z) This matches at end of string, or before \r\n, \r, or \n at end of string, but not between \r and \n. Works across all engines (interpreter, compiled, source generator) since it's pure parser lowering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When AnyNewLine is set, lower \Z using the same sub-tree as $ without Multiline. \Z is not affected by Multiline, so the same lowering applies regardless. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When both Multiline and AnyNewLine are set, lower $ to: (?=\r\n|\r|\z)|(?<!\r)(?=\n) This matches at \r\n, \r, \n boundaries and end-of-string, without matching between \r and \n of a \r\n sequence. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When both Multiline and AnyNewLine are set, lower ^ to: (?<=\A|\r\n|\n)|(?<=\r)(?!\n) This matches after \r\n, \n, bare \r (not followed by \n), and at start of string. Without Multiline, ^ remains \A unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When AnyNewLine is set (without Singleline), lower . to [^\n\r] instead of [^\n], so dot does not match \r or \n. Add NotNewLineOrCarriageReturnClass constant to RegexCharClass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Combined ^/$/. tests, Replace/Split, RightToLeft, mixed newlines, empty lines, \Z with trailing newlines, and edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Integration tests using a ~50 char string with all newline types (\r\n, \r, \n, \u0085, \u2028, \u2029) exercising ^, $, \Z, and . together. Replace/Split tests with MatchEvaluator line numbering. Deduplicated cases moved into per-feature tests (RightToLeft, empty lines). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Expand test coverage across all AnyNewLine-affected constructs: - Dollar, EndZ, DollarMultiline, CaretMultiline, Dot test data with adjacent newlines, newlines at string boundaries, empty segments, RightToLeft, and all Unicode newline types - Advanced tests: inline options, backreferences, conditionals, alternation with anchors, lookahead/lookbehind, quantified dot, lazy quantifiers, named/atomic groups, word boundaries near newlines, explicit char classes unaffected - Methods test: IsMatch, Count, EnumerateMatches, Match with startat, Replace with group ref, Split - Unicode expansion: \s/\S behavior, \w behavior, \p{Zl}/\p{Zp} categories, adjacent Unicode+ASCII newlines, baselines without AnyNewLine No bugs found — all initial test failures were wrong expectations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Verify the fixer correctly emits RegexOptions.Multiline | RegexOptions.AnyNewLine in enum value order when upgrading to GeneratedRegex. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Test cases derived from cross-validation with PCRE2 NEWLINE_ANY behavior (BSD-licensed) and analysis of real-world patterns from dotnet/runtime-assets: - (.+)# greedy where .+ cannot cross newlines (PCRE2 JIT 472) - (.)(.) requiring consecutive non-newlines (PCRE2 JIT 471) - (.). with mixed newline types (PCRE2 JIT 469) - Blank line detection (^ +$) with \n, \r\n, \u0085 separators All 31,528 tests pass. No bugs found — our implementation is fully consistent with PCRE2 NEWLINE_ANY behavior and handles real-world patterns correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add more RightToLeft + AnyNewLine tests (various newline types, dot, anchors, \Z) - Add more Singleline | AnyNewLine tests (all newline types, combined with Multiline) - Replace RegexOptions.AnyNewLine with RegexHelpers.RegexOptionAnyNewLine throughout tests for net481 compilation compatibility - Wrap Count/EnumerateMatches in #if NET for net481 compat - Add clarifying comments on Split behavior with/without AnyNewLine Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley · 2026-02-21T08:31:42Z

(Finally got around to having AI finish my lowering branch..)

Copilot

Pull request overview

This pull request implements RegexOptions.AnyNewLine (value 0x0800 = 2048), a new regex option that makes ^, $, \Z, and . recognize all Unicode line boundaries (\r, \r\n, \n, \u0085 NEL, \u2028 LS, \u2029 PS) instead of only \n. This addresses a major usability issue where users had to manually work around .NET's hardcoded \n-only line ending behavior.

Changes:

Added RegexOptions.AnyNewLine = 0x0800 enum value with incompatibility checks for NonBacktracking and ECMAScript modes
Implemented parser-level lowering of ^, $, \Z, and . into equivalent lookaround-based RegexNode trees when AnyNewLine is enabled
Added comprehensive test coverage (~800 new test lines) covering all anchor types, newline combinations, RightToLeft mode, inline options, and edge cases

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexOptions.cs`	Added `AnyNewLine = 0x0800` enum value with XML documentation
`src/libraries/System.Text.RegularExpressions/ref/System.Text.RegularExpressions.cs`	Updated ref assembly with `AnyNewLine = 2048`
`src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.cs`	Updated `MaxOptionShift` to 12 and added AnyNewLine to NonBacktracking incompatibility check
`src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs`	Implemented lowering methods (`AnyNewLineEndZNode`, `AnyNewLineEolNode`, `AnyNewLineBolNode`) and integrated into `^`, `$`, `\Z`, `.` parsing
`src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs`	Added `NotNewLineOrCarriageReturnClass` constant for `.` with AnyNewLine
`src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs`	Added AnyNewLine to source generator's supported options
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs`	Added ~800 lines of comprehensive tests for all anchor types, newline combinations, and edge cases
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Tests.Common.cs`	Added `RegexOptionAnyNewLine` constant for test compatibility
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Ctor.Tests.cs`	Updated invalid option test from 0x800 to 0x1000; added NonBacktracking+AnyNewLine incompatibility test
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs`	Updated invalid option comments and tests from 0x800 to 0x1000
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.EnumerateMatches.Tests.cs`	Updated invalid option tests from 0x800 to 0x1000
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorParserTests.cs`	Updated invalid option tests from 0x800 to 0x1000
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/UpgradeToGeneratedRegexAnalyzerTests.cs`	Updated tests for 0x1000 as invalid option; added AnyNewLine test case for code fixer

src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Ctor.Tests.cs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.

danmoseley · 2026-02-21T08:40:57Z

@MihuBot benchmark Regex

MihuBot · 2026-02-21T10:17:19Z

See benchmark results at https://gist.github.com/MihuBot/c7399f4f318e4febcfd0018436d5fe53

danmoseley · 2026-02-21T16:02:25Z

Mihubot confirms zero perf impact on existing patterns/options,

danmoseley · 2026-02-21T19:14:12Z

AnyNewLine Performance Analysis (Release, Compiled, .NET 11.0, BenchmarkDotNet)

Measured impact of converting existing newline-workaround patterns to simplified AnyNewLine equivalents. All scenarios use RegexOptions.Compiled (representative of source-generated too). Measured with BenchmarkDotNet (InProcess, ShortRun). All match counts verified identical between old and new patterns.

Section 1: Real-World Patterns on Windows `\r\n` Text

Old Pattern	New Pattern (+ AnyNewLine)	Old (us)	New (us)	Ratio
`^.+\r?$` (1K lines)	`^.+$`	46.7	48.8	1.05x
`^.+\r?$` (10K lines)	`^.+$`	1,694	1,760	1.04x
`\[assembly:...\]\s*$(\r?\n)?`	`\[assembly:...\]\s*$`	38.3	32.4	0.85x
`^([^\s:]+):\s*(.+?)\r?$`	`^([^\s:]+):\s*(.+?)$`	105.9	105.9	1.00x
`^# .+\r?$`	`^# .+$`	11.1	9.1	0.83x
`^.+\r?$` (CSV, 1K rows)	`^.+$`	44.4	49.2	1.11x
`[^\r\n]+`	`.+`	44.2	43.8	0.99x
`\w+\r?$`	`\w+$`	90.8	128.7	1.42x
`(?:^\|\r\n)\w+`	`^\w+`	208.7	214.5	1.03x

Section 2: Unix `\n` Text (overhead of just enabling the flag)

Old Pattern	New Pattern (+ AnyNewLine)	Old (us)	New (us)	Ratio
`^.+$`	`^.+$`	43.5	48.9	1.12x
`[^\n]+`	`.+`	39.0	44.9	1.15x

Section 3: Mixed `\n`/`\r\n` Text

Old Pattern	New Pattern (+ AnyNewLine)	Old (us)	New (us)	Ratio
`[^\r\n\u0085\u2028\u2029]+`	`.+`	45.4	44.2	0.97x
`^.+\r?$` (1K lines)	`^.+$`	44.1	50.1	1.14x

Section 4: Non-anchor/dot Patterns (zero impact expected)

Old Pattern	New Pattern (+ AnyNewLine)	Old (us)	New (us)	Ratio
`\r\n\|\r\|\n`	`\r\n\|\r\|\n`	20.0	21.7	1.08x
`\w+`	`\w+`	322.4	336.4	1.04x

Section 5: Pathological Cases (unlikely in practice)

Old Pattern	New Pattern (+ AnyNewLine)	Old (us)	New (us)	Ratio
`$`	`$`	98.2	134.1	1.37x
`^`	`^`	145.6	131.6	0.90x
`\w+\r?\Z` (329K chars)	`\w+\Z`	494.2	1,039.3	2.10x

Summary

Real-world patterns in Compiled mode show 0.83x--1.14x -- essentially zero cost, and sometimes faster because the AnyNewLine pattern is simpler (e.g., ^# .+$ vs ^# .+\r?$ -- removing the \r? node saves more than the lowered $ costs).
Where small regressions occur (1.1x--1.4x), the cause is the lowered anchor tree: a native $ (Eol) is a single "is next char \n?" check, but AnyNewLine lowers it to a lookahead alternation like (?=\r\n|\r|\n|\u0085|\u2028|\u2029|\z). Even when the input only contains \r\n, the engine must evaluate the alternation branches. This overhead is proportionally more visible when the anchor dominates the work (e.g., \w+$ where the \w+ match is short), and nearly invisible when .+ dominates each line's work (e.g., ^.+$ at 1.04x).
Patterns without anchors or dot are completely unaffected (1.04--1.08x, within noise) -- the flag only changes behavior of ., ^, $, \Z.
Only pathological case: \w+\Z on very large input (329K chars) at 2.1x -- the lowered \Z alternation tree is evaluated during backtracking at many positions. Unlikely in practice.
In Compiled/source-generated mode, the JIT compiles the lowered alternation branches into efficient single-char comparisons, keeping overhead minimal. Interpreted mode shows larger gaps (2--3x for typical patterns) but AnyNewLine + interpreted + perf-sensitive is an unlikely combination.

Benchmark source code (BenchmarkDotNet)

using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Reports;
using BenchmarkDotNet.Toolchains.InProcess.Emit;

BenchmarkRunner.Run<AnyNewLineBenchmarks>(
    DefaultConfig.Instance
        .WithSummaryStyle(SummaryStyle.Default.WithRatioStyle(RatioStyle.Percentage))
        .AddJob(Job.ShortRun.WithToolchain(InProcessEmitToolchain.Instance)));

[MemoryDiagnoser(false)]
[HideColumns("Job", "Error", "StdDev", "RatioSD", "Alloc Ratio")]
public class AnyNewLineBenchmarks
{
    private const RegexOptions AnyNewLine = (RegexOptions)0x0800;

    private static string GenerateText(int lineCount, string[] newlines)
    {
        var sb = new StringBuilder();
        for (int i = 0; i < lineCount; i++)
        {
            sb.Append("Lorem ipsum dolor sit amet ");
            sb.Append(i);
            sb.Append(newlines[i % newlines.Length]);
        }
        return sb.ToString();
    }

    private static readonly string WinText1K = GenerateText(1000, ["\r\n"]);
    private static readonly string WinText10K = GenerateText(10000, ["\r\n"]);
    private static readonly string UnixText1K = GenerateText(1000, ["\n"]);
    private static readonly string MixedNR1K = GenerateText(1000, ["\n", "\r\n"]);
    private static readonly string MixedAll1K = GenerateText(1000,
        ["\n", "\r\n", "\r", "\u0085", "\u2028", "\u2029"]);

    private static readonly string AssemblyInfo;
    private static readonly string KvConfig;
    private static readonly string Markdown;
    private static readonly string CsvData;

    static AnyNewLineBenchmarks()
    {
        var sb = new StringBuilder();
        string[] attrs = {
            "[assembly: AssemblyTitle(\"MyApp\")]",
            "[assembly: AssemblyDescription(\"A sample app\")]",
            "[assembly: AssemblyConfiguration(\"\")]",
            "[assembly: AssemblyCompany(\"Contoso\")]",
            "[assembly: AssemblyProduct(\"MyApp\")]",
            "[assembly: AssemblyCopyright(\"Copyright 2024\")]",
            "[assembly: AssemblyTrademark(\"\")]",
            "[assembly: AssemblyCulture(\"\")]",
            "[assembly: AssemblyVersion(\"1.0.0.0\")]",
            "[assembly: AssemblyFileVersion(\"1.0.0.0\")]"
        };
        foreach (var attr in attrs) { sb.Append(attr); sb.Append("\r\n"); }
        AssemblyInfo = string.Concat(Enumerable.Repeat(sb.ToString(), 50));

        sb.Clear();
        string[] keys = { "Server", "Database", "User", "Password", "Timeout",
                          "MaxPool", "MinPool", "Encrypt", "TrustCert", "AppName" };
        for (int i = 0; i < 50; i++)
        {
            sb.Append(keys[i % keys.Length]); sb.Append(": value_"); sb.Append(i); sb.Append("\r\n");
        }
        KvConfig = string.Concat(Enumerable.Repeat(sb.ToString(), 20));

        sb.Clear();
        for (int i = 0; i < 200; i++)
        {
            sb.Append($"# Heading {i}\r\n");
            sb.Append($"Some paragraph text about topic {i}.\r\n");
            sb.Append($"Another line of content here.\r\n\r\n");
        }
        Markdown = sb.ToString();

        sb.Clear();
        sb.Append("Name,Age,City,Email\r\n");
        for (int i = 0; i < 1000; i++)
            sb.Append($"User{i},{20 + i % 50},City{i % 100},user{i}@example.com\r\n");
        CsvData = sb.ToString();
    }

    // Section 1: Real-world on Windows \r\n text
    private static readonly Regex Old_1a = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_1a = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Baseline = true, Description = "1a_Lines1K_Old")]
    public int Lines1K_Old() => Old_1a.Matches(WinText1K).Count;
    [Benchmark(Description = "1a_Lines1K_New")]
    public int Lines1K_New() => New_1a.Matches(WinText1K).Count;

    private static readonly Regex Old_1b = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_1b = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "1b_Lines10K_Old")]
    public int Lines10K_Old() => Old_1b.Matches(WinText10K).Count;
    [Benchmark(Description = "1b_Lines10K_New")]
    public int Lines10K_New() => New_1b.Matches(WinText10K).Count;

    private static readonly Regex Old_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$(\r?\n)?",
        RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$",
        RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "2_Assembly_Old")]
    public int Assembly_Old() => Old_2.Matches(AssemblyInfo).Count;
    [Benchmark(Description = "2_Assembly_New")]
    public int Assembly_New() => New_2.Matches(AssemblyInfo).Count;

    private static readonly Regex Old_3 = new(@"^([^\s:]+):\s*(.+?)\r?$",
        RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_3 = new(@"^([^\s:]+):\s*(.+?)$",
        RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "3_KeyVal_Old")]
    public int KeyVal_Old() => Old_3.Matches(KvConfig).Count;
    [Benchmark(Description = "3_KeyVal_New")]
    public int KeyVal_New() => New_3.Matches(KvConfig).Count;

    private static readonly Regex Old_4 = new(@"^# .+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_4 = new(@"^# .+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "4_Markdown_Old")]
    public int Markdown_Old() => Old_4.Matches(Markdown).Count;
    [Benchmark(Description = "4_Markdown_New")]
    public int Markdown_New() => New_4.Matches(Markdown).Count;

    private static readonly Regex Old_5 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_5 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "5_CSV_Old")]
    public int CSV_Old() => Old_5.Matches(CsvData).Count;
    [Benchmark(Description = "5_CSV_New")]
    public int CSV_New() => New_5.Matches(CsvData).Count;

    private static readonly Regex Old_6 = new(@"[^\r\n]+", RegexOptions.Compiled);
    private static readonly Regex New_6 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "6_DotExcl_Old")]
    public int DotExcl_Old() => Old_6.Matches(WinText1K).Count;
    [Benchmark(Description = "6_DotExcl_New")]
    public int DotExcl_New() => New_6.Matches(WinText1K).Count;

    private static readonly Regex Old_7 = new(@"\w+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_7 = new(@"\w+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "7_WordEOL_Old")]
    public int WordEOL_Old() => Old_7.Matches(WinText1K).Count;
    [Benchmark(Description = "7_WordEOL_New")]
    public int WordEOL_New() => New_7.Matches(WinText1K).Count;

    private static readonly Regex Old_8 = new(@"(?:^|\r\n)\w+", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_8 = new(@"^\w+", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "8_LineSt_Old")]
    public int LineStart_Old() => Old_8.Matches(WinText1K).Count;
    [Benchmark(Description = "8_LineSt_New")]
    public int LineStart_New() => New_8.Matches(WinText1K).Count;

    // Section 2: Unix \n text (control)
    private static readonly Regex Old_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "9_UnixLines_Old")]
    public int UnixLines_Old() => Old_9.Matches(UnixText1K).Count;
    [Benchmark(Description = "9_UnixLines_New")]
    public int UnixLines_New() => New_9.Matches(UnixText1K).Count;

    private static readonly Regex Old_10 = new(@"[^\n]+", RegexOptions.Compiled);
    private static readonly Regex New_10 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "10_UnixDot_Old")]
    public int UnixDot_Old() => Old_10.Matches(UnixText1K).Count;
    [Benchmark(Description = "10_UnixDot_New")]
    public int UnixDot_New() => New_10.Matches(UnixText1K).Count;

    // Section 3: Mixed newline text
    private static readonly Regex Old_11 = new(@"[^\r\n\u0085\u2028\u2029]+", RegexOptions.Compiled);
    private static readonly Regex New_11 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "11_MixedDot_Old")]
    public int MixedDot_Old() => Old_11.Matches(MixedAll1K).Count;
    [Benchmark(Description = "11_MixedDot_New")]
    public int MixedDot_New() => New_11.Matches(MixedAll1K).Count;

    private static readonly Regex Old_12 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_12 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "12_MixedLines_Old")]
    public int MixedLines_Old() => Old_12.Matches(MixedNR1K).Count;
    [Benchmark(Description = "12_MixedLines_New")]
    public int MixedLines_New() => New_12.Matches(MixedNR1K).Count;

    // Section 4: Non-anchor patterns (zero impact)
    private static readonly Regex Old_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled);
    private static readonly Regex New_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "14_Literal_Old")]
    public int Literal_Old() => Old_14.Matches(MixedAll1K).Count;
    [Benchmark(Description = "14_Literal_New")]
    public int Literal_New() => New_14.Matches(MixedAll1K).Count;

    private static readonly Regex Old_15 = new(@"\w+", RegexOptions.Compiled);
    private static readonly Regex New_15 = new(@"\w+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "15_Words_Old")]
    public int Words_Old() => Old_15.Matches(WinText1K).Count;
    [Benchmark(Description = "15_Words_New")]
    public int Words_New() => New_15.Matches(WinText1K).Count;

    // Section 5: Pathological
    private static readonly Regex Old_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "P1_BareEOL_Old")]
    public int BareEOL_Old() => Old_P1.Matches(WinText1K).Count;
    [Benchmark(Description = "P1_BareEOL_New")]
    public int BareEOL_New() => New_P1.Matches(WinText1K).Count;

    private static readonly Regex Old_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "P2_BareBOL_Old")]
    public int BareBOL_Old() => Old_P2.Matches(WinText1K).Count;
    [Benchmark(Description = "P2_BareBOL_New")]
    public int BareBOL_New() => New_P2.Matches(WinText1K).Count;

    private static readonly Regex Old_P3 = new(@"\w+\r?\Z", RegexOptions.Compiled);
    private static readonly Regex New_P3 = new(@"\w+\Z", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "P3_EndZ_Old")]
    public bool EndZ_Old() => Old_P3.IsMatch(WinText10K);
    [Benchmark(Description = "P3_EndZ_New")]
    public bool EndZ_New() => New_P3.IsMatch(WinText10K);
}

Simplify the lowered trees for $, ^, and \Z anchors: - Eol ($): Merge \r\n|\r|[\u0085\u2028\u2029] into [\r\u0085\u2028\u2029] (4 branches -> 2) \r covers both \r\n and bare \r since lookahead only checks first char - Bol (^): Merge \r\n|\n|[\u0085\u2028\u2029] into [\n\u0085\u2028\u2029] (4 branches -> 2) \n covers both \r\n and bare \n since lookbehind only checks last char - EndZ (\Z): Merge \r? and [\u0085\u2028\u2029] into [\r\u0085\u2028\u2029]? (3 branches -> 2) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley · 2026-02-21T20:52:50Z

Thinking about how to optimize '.' more here:

. without AnyNewLine → RegexNodeKind.Notone '\n' → compiled to a single ch != '\n' comparison
. with AnyNewLine → [^\r\n\u0085\u2028\u2029] → RegexNodeKind.Set → compiled to a bitmap/bitmask operation (branchless but ~3-5 IL ops including subtract + shift + mask)

The compiled JIT for Notone is literally one compare-and-branch. The character class, even with a bitmap optimization, has to do range subtraction + bit shifting. Per character, that's maybe
2-3ns extra, and over 30K+ characters per line-scan, it adds up to the ~10-18% overhead we see.

So the overhead is not from "more chars to match against" — it's from crossing the boundary between the fast-path Notone node type (single char compare) and the generic Set node type
(bitmap). Even a 2-char negated class like [^\r\n] would have the same overhead shape, since it goes through Set rather than Notone.

In principle, the engine could be taught to recognize small negated character classes and emit a chain of != comparisons instead of bitmap ops, but that would be an engine optimization beyond the scope of this PR.

danmoseley · 2026-02-21T20:53:14Z

Updated AnyNewLine Performance Analysis (with anchor optimization)

After the initial analysis above, I optimized the lowered anchor trees by merging redundant alternation branches:

$ (Eol): (?=\r\n|\r|[\u0085\u2028\u2029]|\z) → (?=[\r\u0085\u2028\u2029]|\z) — \r covers both \r\n and bare \r (lookahead only checks first char), so the 4-branch alternation collapses to 2.
^ (Bol): (?<=\A|\r\n|\n|[\u0085\u2028\u2029]) → (?<=[\n\u0085\u2028\u2029]|\A) — \n covers both \r\n and bare \n (lookbehind only checks last char), 4 branches → 2.
\Z (EndZ): Merged \r? and [\u0085\u2028\u2029] into [\r\u0085\u2028\u2029]?, 3 outer branches → 2.

Re-measured with BenchmarkDotNet MediumRun (15 target iterations, 3 warmup — more stable than previous ShortRun). All scenarios use RegexOptions.Compiled, Release build, .NET 11.0. All match counts verified identical.

Section 1: Real-World Patterns on Windows `\r\n` Text

Old Pattern	New Pattern (+ AnyNewLine)	Old (µs)	New (µs)	Ratio
`^.+\r?$` (1K lines)	`^.+$`	44.6	49.9	1.12x
`^.+\r?$` (10K lines)	`^.+$`	1,749	1,797	1.03x
`\[assembly:...\]\s*$(\r?\n)?`	`\[assembly:...\]\s*$`	37.1	33.1	0.89x
`^([^\s:]+):\s*(.+?)\r?$`	`^([^\s:]+):\s*(.+?)$`	105.8	100.0	0.95x
`^# .+\r?$`	`^# .+$`	11.0	9.2	0.83x
`^.+\r?$` (CSV, 1K rows)	`^.+$`	44.4	47.5	1.07x
`[^\r\n]+`	`.+`	41.9	43.4	1.04x
`\w+\r?$`	`\w+$`	85.1	112.9	1.33x
`(?:^\|\r\n)\w+`	`^\w+`	195.7	193.4	0.99x

Section 2: Unix `\n` Text (overhead of just enabling the flag)

Old Pattern	New Pattern (+ AnyNewLine)	Old (µs)	New (µs)	Ratio
`^.+$`	`^.+$`	42.3	46.6	1.10x
`[^\n]+`	`.+`	36.3	42.9	1.18x

Section 3: Mixed `\n`/`\r\n` Text

Old Pattern	New Pattern (+ AnyNewLine)	Old (µs)	New (µs)	Ratio
`[^\r\n\u0085\u2028\u2029]+`	`.+`	43.5	45.6	1.05x
`^.+\r?$` (1K lines)	`^.+$`	46.0	50.5	1.10x

Section 4: Non-anchor/dot Patterns (zero impact expected)

Old Pattern	New Pattern (+ AnyNewLine)	Old (µs)	New (µs)	Ratio
`\r\n\|\r\|\n`	`\r\n\|\r\|\n`	20.1	19.6	0.98x
`\w+`	`\w+`	286.5	285.9	1.00x

Section 5: Pathological Cases (unlikely in practice)

Old Pattern	New Pattern (+ AnyNewLine)	Old (µs)	New (µs)	Ratio
`$` (1K bare evals)	`$`	98.6	120.3	1.22x
`^` (1K bare evals)	`^`	133.3	108.3	0.81x
`\w+\r?\Z` (329K chars)	`\w+\Z`	480.3	931.5	1.94x

Summary

Real-world patterns in Compiled mode show 0.83x–1.12x — essentially zero to modest cost, and sometimes faster because the AnyNewLine pattern is simpler (e.g., ^# .+$ vs ^# .+\r?$ — removing the \r? node saves more than the lowered $ costs).
Where regressions occur (1.1x–1.3x), the cause is the lowered anchor/dot trees: a native $ (Eol) is a single "is next char \n?" check, but AnyNewLine lowers it to a lookahead with a character class [\r\u0085\u2028\u2029] plus \z. Even when the input only contains \r\n, the engine must evaluate the character class. Similarly, . becomes [^\r\n\u0085\u2028\u2029] which crosses from the fast-path Notone node (single ch != '\n' comparison) to a Set node (bitmap/bitmask operation — branchless but ~3-5 IL ops per character). This overhead is proportionally more visible when the anchor/dot dominates the work (e.g., \w+$ where the \w+ match is short, or [^\n]+ → .+ on Unix text), and nearly invisible when the overall pattern has other dominant work.
The anchor optimization improved the worst real-world case (\w+$) from 1.42x to 1.33x, and made bare ^ actually faster (0.81x) by using a 2-branch merged character class instead of 4 separate alternation branches.
Patterns without anchors or dot are completely unaffected (0.98x–1.00x) — the flag only changes behavior of ., ^, $, \Z.
Only pathological case: \w+\Z on very large input (329K chars) at 1.94x — the lowered \Z alternation tree is evaluated during backtracking at many positions. Unlikely in practice.
In Compiled/source-generated mode, the JIT compiles the lowered alternation branches into efficient single-char comparisons, keeping overhead minimal. Interpreted mode would show larger gaps but AnyNewLine + interpreted + perf-sensitive is an unlikely combination.

Benchmark source code (BenchmarkDotNet, MediumRun)

using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Reports;
using BenchmarkDotNet.Toolchains.InProcess.Emit;

BenchmarkRunner.Run<AnyNewLineBenchmarks>(
    DefaultConfig.Instance
        .WithSummaryStyle(SummaryStyle.Default.WithRatioStyle(RatioStyle.Percentage))
        .AddJob(Job.MediumRun.WithToolchain(InProcessEmitToolchain.Instance)));

[MemoryDiagnoser(false)]
[HideColumns("Job", "Error", "StdDev", "RatioSD", "Alloc Ratio")]
public class AnyNewLineBenchmarks
{
    private const RegexOptions AnyNewLine = (RegexOptions)0x0800;

    private static string GenerateText(int lineCount, string[] newlines)
    {
        var sb = new StringBuilder();
        for (int i = 0; i < lineCount; i++)
        {
            sb.Append("Lorem ipsum dolor sit amet ");
            sb.Append(i);
            sb.Append(newlines[i % newlines.Length]);
        }
        return sb.ToString();
    }

    private static readonly string WinText1K = GenerateText(1000, ["\r\n"]);
    private static readonly string WinText10K = GenerateText(10000, ["\r\n"]);
    private static readonly string UnixText1K = GenerateText(1000, ["\n"]);
    private static readonly string MixedNR1K = GenerateText(1000, ["\n", "\r\n"]);
    private static readonly string MixedAll1K = GenerateText(1000,
        ["\n", "\r\n", "\r", "\u0085", "\u2028", "\u2029"]);

    private static readonly string AssemblyInfo;
    private static readonly string KvConfig;
    private static readonly string Markdown;
    private static readonly string CsvData;

    static AnyNewLineBenchmarks()
    {
        var sb = new StringBuilder();
        string[] attrs = {
            "[assembly: AssemblyTitle(\"MyApp\")]",
            "[assembly: AssemblyDescription(\"A sample app\")]",
            "[assembly: AssemblyConfiguration(\"\")]",
            "[assembly: AssemblyCompany(\"Contoso\")]",
            "[assembly: AssemblyProduct(\"MyApp\")]",
            "[assembly: AssemblyCopyright(\"Copyright 2024\")]",
            "[assembly: AssemblyTrademark(\"\")]",
            "[assembly: AssemblyCulture(\"\")]",
            "[assembly: AssemblyVersion(\"1.0.0.0\")]",
            "[assembly: AssemblyFileVersion(\"1.0.0.0\")]"
        };
        foreach (var attr in attrs) { sb.Append(attr); sb.Append("\r\n"); }
        AssemblyInfo = string.Concat(Enumerable.Repeat(sb.ToString(), 50));

        sb.Clear();
        string[] keys = { "Server", "Database", "User", "Password", "Timeout",
                          "MaxPool", "MinPool", "Encrypt", "TrustCert", "AppName" };
        for (int i = 0; i < 50; i++)
        {
            sb.Append(keys[i % keys.Length]); sb.Append(": value_"); sb.Append(i); sb.Append("\r\n");
        }
        KvConfig = string.Concat(Enumerable.Repeat(sb.ToString(), 20));

        sb.Clear();
        for (int i = 0; i < 200; i++)
        {
            sb.Append($"# Heading {i}\r\n");
            sb.Append($"Some paragraph text about topic {i}.\r\n");
            sb.Append($"Another line of content here.\r\n\r\n");
        }
        Markdown = sb.ToString();

        sb.Clear();
        sb.Append("Name,Age,City,Email\r\n");
        for (int i = 0; i < 1000; i++)
            sb.Append($"User{i},{20 + i % 50},City{i % 100},user{i}@example.com\r\n");
        CsvData = sb.ToString();
    }

    // Section 1: Real-world on Windows \r\n text
    private static readonly Regex Old_1a = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_1a = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Baseline = true, Description = "1a_Lines1K_Old")]
    public int Lines1K_Old() => Old_1a.Matches(WinText1K).Count;
    [Benchmark(Description = "1a_Lines1K_New")]
    public int Lines1K_New() => New_1a.Matches(WinText1K).Count;

    private static readonly Regex Old_1b = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_1b = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "1b_Lines10K_Old")]
    public int Lines10K_Old() => Old_1b.Matches(WinText10K).Count;
    [Benchmark(Description = "1b_Lines10K_New")]
    public int Lines10K_New() => New_1b.Matches(WinText10K).Count;

    private static readonly Regex Old_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$(\r?\n)?",
        RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$",
        RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "2_Assembly_Old")]
    public int Assembly_Old() => Old_2.Matches(AssemblyInfo).Count;
    [Benchmark(Description = "2_Assembly_New")]
    public int Assembly_New() => New_2.Matches(AssemblyInfo).Count;

    private static readonly Regex Old_3 = new(@"^([^\s:]+):\s*(.+?)\r?$",
        RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_3 = new(@"^([^\s:]+):\s*(.+?)$",
        RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "3_KeyVal_Old")]
    public int KeyVal_Old() => Old_3.Matches(KvConfig).Count;
    [Benchmark(Description = "3_KeyVal_New")]
    public int KeyVal_New() => New_3.Matches(KvConfig).Count;

    private static readonly Regex Old_4 = new(@"^# .+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_4 = new(@"^# .+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "4_Markdown_Old")]
    public int Markdown_Old() => Old_4.Matches(Markdown).Count;
    [Benchmark(Description = "4_Markdown_New")]
    public int Markdown_New() => New_4.Matches(Markdown).Count;

    private static readonly Regex Old_5 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_5 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "5_CSV_Old")]
    public int CSV_Old() => Old_5.Matches(CsvData).Count;
    [Benchmark(Description = "5_CSV_New")]
    public int CSV_New() => New_5.Matches(CsvData).Count;

    private static readonly Regex Old_6 = new(@"[^\r\n]+", RegexOptions.Compiled);
    private static readonly Regex New_6 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "6_DotExcl_Old")]
    public int DotExcl_Old() => Old_6.Matches(WinText1K).Count;
    [Benchmark(Description = "6_DotExcl_New")]
    public int DotExcl_New() => New_6.Matches(WinText1K).Count;

    private static readonly Regex Old_7 = new(@"\w+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_7 = new(@"\w+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "7_WordEOL_Old")]
    public int WordEOL_Old() => Old_7.Matches(WinText1K).Count;
    [Benchmark(Description = "7_WordEOL_New")]
    public int WordEOL_New() => New_7.Matches(WinText1K).Count;

    private static readonly Regex Old_8 = new(@"(?:^|\r\n)\w+", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_8 = new(@"^\w+", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "8_LineSt_Old")]
    public int LineStart_Old() => Old_8.Matches(WinText1K).Count;
    [Benchmark(Description = "8_LineSt_New")]
    public int LineStart_New() => New_8.Matches(WinText1K).Count;

    // Section 2: Unix \n text (control)
    private static readonly Regex Old_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "9_UnixLines_Old")]
    public int UnixLines_Old() => Old_9.Matches(UnixText1K).Count;
    [Benchmark(Description = "9_UnixLines_New")]
    public int UnixLines_New() => New_9.Matches(UnixText1K).Count;

    private static readonly Regex Old_10 = new(@"[^\n]+", RegexOptions.Compiled);
    private static readonly Regex New_10 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "10_UnixDot_Old")]
    public int UnixDot_Old() => Old_10.Matches(UnixText1K).Count;
    [Benchmark(Description = "10_UnixDot_New")]
    public int UnixDot_New() => New_10.Matches(UnixText1K).Count;

    // Section 3: Mixed newline text
    private static readonly Regex Old_11 = new(@"[^\r\n\u0085\u2028\u2029]+", RegexOptions.Compiled);
    private static readonly Regex New_11 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "11_MixedDot_Old")]
    public int MixedDot_Old() => Old_11.Matches(MixedAll1K).Count;
    [Benchmark(Description = "11_MixedDot_New")]
    public int MixedDot_New() => New_11.Matches(MixedAll1K).Count;

    private static readonly Regex Old_12 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_12 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "12_MixedLines_Old")]
    public int MixedLines_Old() => Old_12.Matches(MixedNR1K).Count;
    [Benchmark(Description = "12_MixedLines_New")]
    public int MixedLines_New() => New_12.Matches(MixedNR1K).Count;

    // Section 4: Non-anchor patterns (zero impact)
    private static readonly Regex Old_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled);
    private static readonly Regex New_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "14_Literal_Old")]
    public int Literal_Old() => Old_14.Matches(MixedAll1K).Count;
    [Benchmark(Description = "14_Literal_New")]
    public int Literal_New() => New_14.Matches(MixedAll1K).Count;

    private static readonly Regex Old_15 = new(@"\w+", RegexOptions.Compiled);
    private static readonly Regex New_15 = new(@"\w+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "15_Words_Old")]
    public int Words_Old() => Old_15.Matches(WinText1K).Count;
    [Benchmark(Description = "15_Words_New")]
    public int Words_New() => New_15.Matches(WinText1K).Count;

    // Section 5: Pathological
    private static readonly Regex Old_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "P1_BareEOL_Old")]
    public int BareEOL_Old() => Old_P1.Matches(WinText1K).Count;
    [Benchmark(Description = "P1_BareEOL_New")]
    public int BareEOL_New() => New_P1.Matches(WinText1K).Count;

    private static readonly Regex Old_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "P2_BareBOL_Old")]
    public int BareBOL_Old() => Old_P2.Matches(WinText1K).Count;
    [Benchmark(Description = "P2_BareBOL_New")]
    public int BareBOL_New() => New_P2.Matches(WinText1K).Count;

    private static readonly Regex Old_P3 = new(@"\w+\r?\Z", RegexOptions.Compiled);
    private static readonly Regex New_P3 = new(@"\w+\Z", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "P3_EndZ_Old")]
    public bool EndZ_Old() => Old_P3.IsMatch(WinText10K);
    [Benchmark(Description = "P3_EndZ_New")]
    public bool EndZ_New() => New_P3.IsMatch(WinText10K);
}

danmoseley · 2026-02-25T02:36:12Z

Some real test failures...

AnyNewLine (0x800) is not a valid RegexOptions value on .NET Framework, so the Regex constructor throws ArgumentOutOfRangeException. Add [SkipOnTargetFramework] to all 10 AnyNewLine test methods. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.

danmoseley and others added 14 commits February 20, 2026 21:59

Lower \Z with AnyNewLine

37317bb

When AnyNewLine is set, lower \Z using the same sub-tree as $ without Multiline. \Z is not affected by Multiline, so the same lowering applies regardless. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Lower . with AnyNewLine

1141050

When AnyNewLine is set (without Singleline), lower . to [^\n\r] instead of [^\n], so dot does not match \r or \n. Add NotNewLineOrCarriageReturnClass constant to RegexCharClass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add AnyNewLine integration tests

283d49d

Combined ^/$/. tests, Replace/Split, RightToLeft, mixed newlines, empty lines, \Z with trailing newlines, and edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add AnyNewLine test for UpgradeToGeneratedRegex analyzer

a959964

Verify the fixer correctly emits RegexOptions.Multiline | RegexOptions.AnyNewLine in enum value order when upgrading to GeneratedRegex. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Test NonBacktracking|AnyNewLine is rejected

681a3ba

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Style: single-line ternaries for $ and \Z lowering

c96689e

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings February 21, 2026 08:27

github-actions bot added the area-System.Text.RegularExpressions label Feb 21, 2026

dotnet-policy-service bot assigned danmoseley Feb 21, 2026

Copilot started reviewing on behalf of danmoseley February 21, 2026 08:28 View session

Copilot AI reviewed Feb 21, 2026

View reviewed changes

src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Ctor.Tests.cs Show resolved Hide resolved

Add ECMAScript+AnyNewLine rejection test

e8c8fcb

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley requested a review from Copilot February 21, 2026 08:36

Copilot started reviewing on behalf of danmoseley February 21, 2026 08:37 View session

Copilot AI reviewed Feb 21, 2026

View reviewed changes

MihuBot mentioned this pull request Feb 21, 2026

[Benchmark X64] [danmoseley] Add RegexOptions.AnyNewLine via parser lowering MihuBot/runtime-utils#1775

Open

build-analysis bot mentioned this pull request Feb 21, 2026

[android][clr] No peer certificates when executing System.Net.Http.Functional.Tests on Android emulator #124526

Open

Copilot AI review requested due to automatic review settings February 25, 2026 02:53

Copilot started reviewing on behalf of danmoseley February 25, 2026 02:54 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RegexOptions.AnyNewLine via parser lowering#124701

Add RegexOptions.AnyNewLine via parser lowering#124701
danmoseley wants to merge 17 commits intodotnet:mainfrom
danmoseley:anynewline-lower-v2

danmoseley commented Feb 21, 2026 •

edited

Loading

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

MihuBot commented Feb 21, 2026

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

danmoseley commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

danmoseley commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Summary

Approach: Parser Lowering

Out of scope: \R

Changes

Production code

Tests

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

MihuBot commented Feb 21, 2026

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

danmoseley commented Feb 21, 2026

AnyNewLine Performance Analysis (Release, Compiled, .NET 11.0, BenchmarkDotNet)

Section 1: Real-World Patterns on Windows \r\n Text

Section 2: Unix \n Text (overhead of just enabling the flag)

Section 3: Mixed \n/\r\n Text

Section 4: Non-anchor/dot Patterns (zero impact expected)

Section 5: Pathological Cases (unlikely in practice)

Summary

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

danmoseley commented Feb 21, 2026

Updated AnyNewLine Performance Analysis (with anchor optimization)

Section 1: Real-World Patterns on Windows \r\n Text

Section 2: Unix \n Text (overhead of just enabling the flag)

Section 3: Mixed \n/\r\n Text

Section 4: Non-anchor/dot Patterns (zero impact expected)

Section 5: Pathological Cases (unlikely in practice)

Summary

Uh oh!

danmoseley commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danmoseley commented Feb 21, 2026 •

edited

Loading

Out of scope: `\R`

Section 1: Real-World Patterns on Windows `\r\n` Text

Section 2: Unix `\n` Text (overhead of just enabling the flag)

Section 3: Mixed `\n`/`\r\n` Text

Section 1: Real-World Patterns on Windows `\r\n` Text

Section 2: Unix `\n` Text (overhead of just enabling the flag)

Section 3: Mixed `\n`/`\r\n` Text