Improve lint performance by optimizing Swift Regex usage#968
Improve lint performance by optimizing Swift Regex usage#968ahoppen merged 1 commit intoswiftlang:mainfrom
Conversation
|
I was able to confirm the difference in creation cost through testing. let pattern = #"^\s*\/\/\s*swift-format-ignore(?:\s*:\s*(?<ruleNames>.+))?$"#
func testSwiftRegex() { // 0.677 sec
measure {
for _ in 0..<10000 {
_ = try! Regex(pattern)
}
}
}
func testNSRegularExpression() { // 0.036 sec
measure {
for _ in 0..<10000 {
_ = try! NSRegularExpression(pattern: pattern, options: [])
}
}
} |
allevato
left a comment
There was a problem hiding this comment.
Nice! Wait for @ahoppen to sign off too, but those + static numbers look close enough to me that we could keep this and not have to go back to NSRegularExpression.
One other thing you might want to try to squeeze out more performance: by default, Regex will match based on Characters, which involves some semi-expensive Unicode grapheme clustering. I don't expect that we'd ever want rule names that depend on Unicode normalization, so I'd be curious to know if having IgnoreDirective do this would make it even a little bit faster when matching:
try! Regex(pattern).matchingSemantics(.unicodeScalar)
I don't think NSRegularExpression does grapheme-based matching (I could definitely be wrong though), so that might explain part of the difference.
Oh, I learned something new thanks to you. I appreciate your feedback 😃 I'll go ahead and incorporate this change into the code. |
242b29f to
242dfd1
Compare
From my investigation,
Swift'sRegexappears to be more resource-intensive to create compared toNSRegularExpression🤔When running the linter, a new
RuleMaskis initialized for each file. Even if theRegexis cached withinRuleMask, it gets recreated when moving to the next file becauseRuleMaskitself is newly initialized.By declaring the
Regexas static, it is created only once during the entire linting process, improving performance to a level similar to before.Interestingly, using
NSRegularExpressionwith static did not show a significant difference compared to creating a new instance each time.