Vectorize RegexInterpreter opcode loops for Oneloop, Onerep, Notonerep, and MatchString#124628
Conversation
Replace the per-character loop in the Oneloop/Oneloopatomic opcode handler
with a vectorized IndexOfAnyExcept call for left-to-right matching. This
mirrors the existing optimization already applied to Notoneloop (which uses
IndexOf), enabling SIMD-accelerated scanning when matching repeated
occurrences of a single character (e.g. a+ or a{3,}).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the per-character loop in the Onerep opcode handler with a
vectorized ContainsAnyExcept call for left-to-right matching. This enables
SIMD-accelerated verification when matching a fixed number of occurrences
of a single character (e.g. the minimum repetitions of a{5,}).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the per-character loop in the Notonerep opcode handler with a
vectorized Contains call for left-to-right matching. This enables
SIMD-accelerated verification when matching a fixed number of characters
that must not be a specific character (e.g. [^a]{5}).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the per-character backwards comparison loop in MatchString with a vectorized SequenceEqual call for left-to-right matching. This enables SIMD-accelerated string comparison when matching literal multi-character strings within regex patterns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR optimizes hot-path opcode handling in RegexInterpreter by replacing per-character loops with SIMD-accelerated span operations for left-to-right matching, extending the existing vectorization precedent in the interpreter.
Changes:
- Vectorize literal string matching (
Multi/MatchString) usingReadOnlySpan<char>.SequenceEqual. - Vectorize fixed-count opcodes
OnerepandNotonerepusingContainsAnyExcept/Containsfor left-to-right paths. - Vectorize greedy single-char loops
Oneloop/OneloopatomicusingIndexOfAnyExceptfor left-to-right paths.
|
Real-world impact estimate: Analyzing the 15,817 unique patterns in the regex test corpus (assuming interpreter engine):
Follow-up PR #124630 adds SearchValues-based vectorization for |
|
@MihuBot benchmark Regex |
|
See benchmark results at https://gist.github.com/MihuBot/2004009ab9c5dbd509e407cc62b2d5a5 |
MihuBot Benchmark AnalysisCompiled and NonBacktracking paths are entirely unaffected (ratios 0.98–1.02 across all suites), as expected since the PR only modifies interpreter opcodes. Interpreter regressions flagged by MihuBot
Investigation: do these hit modified opcodes?I mapped each regressed benchmark's pattern to the interpreter opcodes it exercises:
5 of 8 regressions don't exercise any modified opcode. The 3 that marginally touch modified code are dominated by other costs (backtracking, alternation, cache behavior). Root cause: JIT code layout effects
These are interpreter-only, sub-microsecond-scale, on shared cloud VMs, affecting unmodified code paths — classic JIT layout noise. |
|
build analysis is green - test failures are unrelated. ready for review? |
MihuBot Results vs. Local BenchmarksThe MihuBot standard benchmark suites (Sherlock, Leipzig, BoostDocs, etc.) don't directly validate the 2x-7x local speedups because they use complex real-world patterns where the hot paths are mostly Setloop/Setrep (character classes) rather than the Oneloop/Onerep/Notonerep/MatchString opcodes modified here, and literal strings in the patterns are short (e.g. What MihuBot does confirm:
The local microbenchmarks are the right tool for validating these specific codepaths since they isolate the modified opcodes with long enough inputs to show the SIMD gains. |
...raries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs
Outdated
Show resolved
Hide resolved
...raries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs
Outdated
Show resolved
Hide resolved
...raries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs
Outdated
Show resolved
Hide resolved
...raries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs
Outdated
Show resolved
Hide resolved
…udeSubdirectories test (#125682) ## Description `FileSystemWatcher_SymbolicLink_TargetsDirectory_Create_IncludeSubdirectories` was consistently flaky on macOS, failing with `AggregateException: (Expected Event occurred) × 3` from the `ExpectNoEvent` assertion. **Root cause:** macOS FSEvents can deliver a late `Created` event for `subDir` (created during test setup, just before the stream starts at `kFSEventStreamEventIdSinceNow`). Since `subDir` is a direct child of the watched path, it correctly passes `CheckIfPathIsNested` even with `IncludeSubdirectories = false`. With no `expectedPath` filter on `ExpectNoEvent`, *any* `Created` event triggered the failure—including this unrelated one. **Changes:** - **`ExpectNoEvent` — add path filter:** Pass `expectedPath: Path.Combine(linkPath, subDir, subDirLv2)` so the assertion only fails if a `Created` event fires at the specific nested path under test. Spurious events at sibling paths (e.g. `subDir` itself) are ignored. - **`[ActiveIssue]` — removed:** The `[ActiveIssue]` attribute has been removed entirely. The `expectedPath` fix makes the test robust enough to run on all platforms without skipping. - **Comments — added disk-layout diagram and inline path annotations:** A layout comment explains the relationship between `tempDir`, `tempSubDir`, `linkPath`, and `subDirLv2Path`. Each path variable and `expectedPath` argument is annotated with its concrete resolved value (e.g. `// linkPath/subDir/subDirLv2`) to make the test easier to follow. ## Security No security-relevant changes. <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> ---- *This section details on the original issue you should resolve* <issue_title>FileSystemWatcher_SymbolicLink_TargetsDirectory_Create_IncludeSubdirectories failed with missed event</issue_title> <issue_description>## Build Information Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=1302001 Build error leg or test failing: System.IO.Tests.SymbolicLink_Changed_Tests.FileSystemWatcher_SymbolicLink_TargetsDirectory_Create_IncludeSubdirectories Pull request: #124628 <!-- Error message template --> ## Error Message Fill the error message using [step by step known issues guidance](https://github.com/dotnet/arcade/blob/main/Documentation/Projects/Build%20Analysis/KnownIssueJsonStepByStep.md). <!-- Use ErrorMessage for String.Contains matches. Use ErrorPattern for regex matches (single line/no backtracking). Set BuildRetry to `true` to retry builds with this error. Set ExcludeConsoleLog to `true` to skip helix logs analysis. --> ```json { "ErrorMessage": "System.AggregateException : One or more errors occurred. (Expected Event occurred) (Expected Event occurred) (Expected Event occurred)", "ErrorPattern": "", "BuildRetry": false, "ExcludeConsoleLog": false } ``` <!-- Known issue validation start --> ### Known issue validation **Build: 🔎** https://dev.azure.com/dnceng-public/public/_build/results?buildId=1302001 **Error message validated:** `[One or more errors occurred`] **Result validation:** ✅ Known issue matched with the provided build. **Validation performed at:** 2/20/2026 8:56:47 PM UTC <!-- Known issue validation end --> <!--Known issue error report start --> ### Report |Build|Definition|Test|Pull Request| |---|---|---|---| |[1302819](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1302819)|dotnet/runtime|[System.IO.Tests.SymbolicLink_Changed_Tests.FileSystemWatcher_SymbolicLink_TargetsDirectory_Create_IncludeSubdirectories](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1302819&view=ms.vss-test-web.build-test-results-tab&runId=36387400&resultId=122537)|dotnet/runtime#124660| |[1302001](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1302001)|dotnet/runtime|[System.IO.Tests.SymbolicLink_Changed_Tests.FileSystemWatcher_SymbolicLink_TargetsDirectory_Create_IncludeSubdirectories](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1302001&view=ms.vss-test-web.build-test-results-tab&runId=36361002&resultId=122959)|dotnet/runtime#124628| #### Summary |24-Hour Hit Count|7-Day Hit Count|1-Month Count| |---|---|---| |2|2|2| <!--Known issue error report end --> <!-- Known issue validation start --> ### Known issue validation **Build: 🔎** https://dev.azure.com/dnceng-public/public/_build/results?buildId=1302001 **Error message validated:** `[System.AggregateException : One or more errors occurred. (Expected Event occurred) (Expected Event occurred) (Expected Event occurred)`] **Result validation:** ✅ Known issue matched with the provided build. **Validation performed at:** 2/20/2026 11:20:21 PM UTC <!-- Known issue validation end --> <!--Known issue error report start --> ### Report |Build|Definition|Test|Pull Request| |---|---|---|---| |[1302819](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1302819)|dotnet/runtime|[System.IO.Tests.SymbolicLink_Changed_Tests.FileSystemWatcher_SymbolicLink_TargetsDirectory_Create_IncludeSubdirectories](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1302819&view=ms.vss-test-web.build-test-results-tab&runId=36387400&resultId=122537)|dotnet/runtime#124660| |[1302001](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1302001)|dotnet/runtime|[System.IO.Tests.SymbolicLink_Changed_Tests.FileSystemWatcher_SymbolicLink_TargetsDirectory_Create_IncludeSubdirectories](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1302001&view=ms.vss-test-web.build-test-results-tab&runId=36361002&resultId=122959)|dotnet/runtime#124628| #### Summary |24-Hour Hit Count|7-Day Hit Count|1-Month Count| |---|---|---| |2|2|2| <!--Known issue error report end --> <!--Known issue error report start --> ### Report |Build|Definition|Test|Pull Request| |---|---|---|---| |[1311577](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1311577)|dotnet/runtime|[System.IO.Tests.SymbolicLink_Changed_Tests.FileSystemWatcher_SymbolicLink_TargetsDirectory_Create_IncludeSubdirectories](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1311577&view=ms.vss-test-web.build-test-results-tab&runId=36636158&resultId=122687)|| |[1310526](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1310526)|dotnet/runtime|[System.IO.Tests.SymbolicLink_Changed_Tests.FileSystemWatcher_SymbolicLink_TargetsDirectory_Create_IncludeSubdirectories](https://dev.azure.com/dnceng-public/publ... </details> <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes #124677 - Fixes #124847 <!-- START COPILOT CODING AGENT TIPS --> --- 🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. [Learn more about Advanced Security.](https://gh.io/cca-advanced-security) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com>
Replace bounds-check + SequenceEqual with StartsWith for LTR path, and per-char reverse loop with EndsWith for RTL path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These single-char opcodes are hit by ~1% of real patterns and the vectorized calls add code complexity with marginal real-world benefit. Keep only the MatchString StartsWith/EndsWith simplification. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
I've reduced this PR to only the Why drop Oneloop/Onerep/Notonerep: Stephen's right that these are hard to justify for real-world patterns. Analyzing the 15,817 real-world patterns: single-char quantifiers like Unlike #124630 (SearchValues for Setloop/Setrep), there's no construction-time overhead here -- these are just match-time Why keep MatchString:
|
|
/ba-g infra |
The
RegexInterpreteralready had a precedent for vectorizing per-character loops: theNotoneloop/Notoneloopatomicopcode usedIndexOffor left-to-right matching. This PR extends that pattern to four more opcodes:a+,a*): UseIndexOfAnyExcept(ch)instead of a per-char loopa{N}): UseContainsAnyExcept(ch)instead of a per-char equality loop[^x]{N}): UseContains(ch)instead of a per-char inequality loopSequenceEqualinstead of a per-char comparison loopAll optimizations apply only to left-to-right matching paths. Right-to-left paths (rare) are left unchanged as they can't benefit from forward-scanning vectorization.
These methods (
IndexOfAnyExcept,ContainsAnyExcept,Contains,SequenceEqual) are SIMD-accelerated in .NET and process 16–32 chars at a time vs 1-at-a-time in the original loops.Benchmark Results
Tested on Intel Core i9-14900K, .NET 11.0.0-dev, using BenchmarkDotNet with
--coreruncomparing before and after builds:a+(64 chars)a+(256 chars)a+(1024 chars)a*(256 chars)a{64}a{256}[^x]{64}[^x]{256}Zero regressions. Zero allocation changes. Improvements scale with input length as expected from SIMD vectorization.
Benchmark source code