NonBacktracking inner matching loop optimizations#70217
NonBacktracking inner matching loop optimizations#70217stephentoub merged 6 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsThis PR addresses the regressions related to the inner matching loop performance in #70022 introduced by #69839 and includes further optimizations. The changes in the PR are roughly:
One optimization that I evaluated, but that didn't decisively help was pulling out the decision to handle the "last end of line" special minterm outside the inner matching loop. I'm guessing the I compared performance after implementing each optimization to check that each was helping. Here's a total before and after comparison for NonBacktracking on all the relevant benchmarks in dotnet/performance.
|
stephentoub
left a comment
There was a problem hiding this comment.
Those improvements look great. Do we know why the cited regressions regress?
There was a problem hiding this comment.
This looks like the only call site for the wrapper TryCreateNewTransition that was added. Any reason this can't instead just be:
if (nextStateId > 0 || builder.TryCreateNewTransition(state.DfaState!, mintermId, dfaOffset, checkThreshold: true, out DfaMatchingState<TSet> newState))
{
state.DfaStateId = newState.Id;
return true;
}rather than having that extra wrapper?
There was a problem hiding this comment.
You're right, this is cleaner.
There was a problem hiding this comment.
Ah, now I remember why I added it: having a wrapper for the state ID allowed a single if body here to handle both an existing transition and a new one. I'll still remove the wrapper, since expanding the code here makes the different cases a bit more apparent anyway.
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Outdated
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Outdated
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
What does "Specialized" mean? Specialized for or to do what?
There was a problem hiding this comment.
It's essentially "curry with preferred generic type parameter and invoke", but I saw your other suggestion on getting rid of these functions and that does look cleaner.
There was a problem hiding this comment.
There's only a single call site for this, these are both aggressively inlined, and the naming with both the "Specialized" prefix and "2" suffix is... suspect :) Can the single call site just be changed to:
int matchEnd = (_findOpts is null, _pattern._info.ContainsSomeAnchor) switch
{
(false, false) => FindEndPosition<NoOptimizationsInitialStateHandler, NoAnchorsNullabilityHandler>(input, pos, timeoutOccursAt, mode, out initialStatePos, out matchLength, perThreadData),
(true, false) => FindEndPosition<InitialStateFindOptimizationsHandler, NoAnchorsNullabilityHandler>(input, pos, timeoutOccursAt, mode, out initialStatePos, out matchLength, perThreadData),
(false, true) => FindEndPosition<NoOptimizationsInitialStateHandler, FullNullabilityHandler>(input, pos, timeoutOccursAt, mode, out initialStatePos, out matchLength, perThreadData),
(true, true) => FindEndPosition<InitialStateFindOptimizationsHandler, FullNullabilityHandler>(input, pos, timeoutOccursAt, mode, out initialStatePos, out matchLength, perThreadData),
};There was a problem hiding this comment.
Ditto here... can this just be manually inlined into the single call site?
matchStart =
matchEnd < startat ? startat :
_pattern._info.ContainsSomeAnchor ? FindStartPosition<FullNullabilityHandler>(input, i, matchStartBoundary, perThreadData) :
FindStartPosition<NoAnchorsNullabilityHandler>(input, i, matchStartBoundary, perThreadData);There was a problem hiding this comment.
Why add the else block? Seems cleaner to just remove the else and unindent all of its contents.
There was a problem hiding this comment.
| var (isInitial, isDeadend, isNullable, canBeNullable) = TStateHandler.GetStateInfo(builder, ref state); | |
| (bool isInitial, bool isDeadend, bool isNullable, bool canBeNullable) = TStateHandler.GetStateInfo(builder, ref state); |
There was a problem hiding this comment.
| var (isInitial, isDeadend, isNullable, canBeNullable) = TStateHandler.GetStateInfo(builder, ref state); | |
| (bool isInitial, bool isDeadend, bool isNullable, bool canBeNullable) = TStateHandler.GetStateInfo(builder, ref state); |
I don't think the regressions I cited actually do regress, but rather my laptop isn't stable enough as a benchmarking machine. I'll benchmark again before merging to make sure. |
a23b597 to
3920d3b
Compare
stephentoub
left a comment
There was a problem hiding this comment.
Thanks, @olsaarik. Looks good. Left a few style comments, but I'm going to merge this to unblock subsequent changes and to get the regressions addressed.
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexBuilder.cs
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexBuilder.cs
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Show resolved
Hide resolved
| { | ||
| return coreState.FixedLength(nextCharKind); | ||
| } | ||
| } |
|
Improvements on Arm64 Windows - dotnet/perf-autofiling-issues#6176 |
|
Improvements on x64: dotnet/perf-autofiling-issues#6037 and dotnet/perf-autofiling-issues#6130 |
This PR addresses the regressions related to the inner matching loop performance in #70022 introduced by #69839 and includes further optimizations. The changes in the PR are roughly:
intassignment. I've also implemented support for fixed length markers in NFA mode.RegexFindOptimizationsout of the inner matching loop by using aIInitialStateHandlerinterface and the "templating" technique with structs implementing the interface as generic type arguments that we already use for DFA and NFA mode.INullabilityHandlerinterface.(bool IsInitial, bool IsDeadend, bool IsNullable, bool CanBeNullable)in theSymbolicRegexBuilderthat can be accessed by state IDs. This reduces pointer chasing involved in looking these up from theDfaMatchingStateand its relatedSymbolicRegexNode. The existingIsInitialfield fromDfaMatchingStateis removed for some memory saving.CurrentStatehold the ID (anint) of the current DFA state instead of theDfaMatchingStateinstance. This allows cached transitions to be taken in DFA mode without ever dereferencing the currentDfaMatchingState. Coupled with the above optimization theDfaMatchingStateis rarely needed in typical patterns.One optimization that I evaluated, but that didn't decisively help was pulling out the decision to handle the "last end of line" special minterm outside the inner matching loop. I'm guessing the
char == '\n' && pos == input.Length - 1check is already cheap enough that it isn't really visible.I compared performance after implementing each optimization to check that each was helping. Here's a total before and after comparison for NonBacktracking on all the relevant benchmarks in dotnet/performance. The baseline here is the situation after the regressions in #70022.
summary:
better: 58, geomean: 1.360
worse: 5, geomean: 1.057
total diff: 63