I am working on a replacement of a handcrafted parser for library ClosedXml (a library to manipulate xlsx files) with the XLParser. Because xlsx can have hundreds of thousands of formulas, I would like to improve performance of XLParser.
I would like to add prefixes to the RegexBasedTerminals.
Irony uses a first character prefixes to build a table of a char->possible terminals (terminals without prefix are always considered). This table is then used to speedup a calculation of current terminal.
I would also like to change grammar to be case sensitive (small, but measurable improvement) and terminals already use both cases, where necessary (e.g. a-zA-Z).
I have tried to change regex options of the terminals (through reflection) - RegexOptions.ExplicitCapture (as recommended in best practices), RegexOptions.Compiled, RegexOptions.CultureInvariant . but there wasn't significant improvements.
I have run a benchmark on EnronFormulasParseTest (test was modified to be single threaded). Parser version with prefixes runs 44% faster.
BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22000.856/21H2)
AMD Ryzen 5 5500U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.302
[Host] : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2
Job-JFUIAS : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2
IterationCount=3 LaunchCount=1 WarmupCount=1
With prefixes
| Method |
Mean |
Error |
StdDev |
| EnronDataSet |
26.496 s |
3.6721 s |
0.2013 s |
| EusesFormulasParseTest |
2.852 s |
0.0582 s |
0.0032 s |
Without prefixes
| Method |
Mean |
Error |
StdDev |
| EnronDataSet |
47.295 s |
2.5500 s |
0.1398 s |
| EusesFormulasParseTest |
4.738 s |
0.3636 s |
0.0199 s |
I am working on a replacement of a handcrafted parser for library ClosedXml (a library to manipulate xlsx files) with the XLParser. Because xlsx can have hundreds of thousands of formulas, I would like to improve performance of XLParser.
I would like to add prefixes to the
RegexBasedTerminals.Irony uses a first character prefixes to build a table of a char->possible terminals (terminals without prefix are always considered). This table is then used to speedup a calculation of current terminal.
I would also like to change grammar to be case sensitive (small, but measurable improvement) and terminals already use both cases, where necessary (e.g.
a-zA-Z).I have tried to change regex options of the terminals (through reflection) -
RegexOptions.ExplicitCapture(as recommended in best practices),RegexOptions.Compiled,RegexOptions.CultureInvariant. but there wasn't significant improvements.I have run a benchmark on EnronFormulasParseTest (test was modified to be single threaded). Parser version with prefixes runs 44% faster.
With prefixes
Without prefixes