Add prefixes to regex terminals

I am working on a [replacement](https://github.com/ClosedXML/ClosedXML/pull/1836) of a handcrafted parser for library ClosedXml (a library to manipulate xlsx files) with the XLParser. Because xlsx can have hundreds of thousands of formulas, I would like to improve performance of XLParser.

I would like to add prefixes to the `RegexBasedTerminal`s.

Irony uses a first character prefixes to [build a table](https://github.com/IronyProject/Irony/blob/a77ef332c3d6868ce5326b126efba9d5eeba53c2/Irony/Parsing/Data/Construction/ScannerDataBuilder.cs#L65) of a char->possible terminals (terminals without prefix are always considered). This table is then used to speedup a [calculation of current terminal](https://github.com/IronyProject/Irony/blob/a77ef332c3d6868ce5326b126efba9d5eeba53c2/Irony/Parsing/Scanner/Scanner.cs#L200).

I would also like to change grammar to be case sensitive  (small, but measurable improvement) and terminals already use both cases, where necessary (e.g. `a-zA-Z`).

I have tried to change regex options of the terminals (through reflection) - `RegexOptions.ExplicitCapture` (as recommended in [best practices](https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices#capture-only-when-necessary)), `RegexOptions.Compiled`, `RegexOptions.CultureInvariant` . but there wasn't significant improvements.

I have run a benchmark on EnronFormulasParseTest (test was modified to be single threaded). Parser version with prefixes runs 44% faster.

```ini
BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22000.856/21H2)
AMD Ryzen 5 5500U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.302
  [Host]     : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2
  Job-JFUIAS : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2

IterationCount=3  LaunchCount=1  WarmupCount=1
```

With prefixes

|                 Method |     Mean |    Error |   StdDev |
|----------------------- |---------:|---------:|---------:|
|           EnronDataSet | 26.496 s | 3.6721 s | 0.2013 s |
| EusesFormulasParseTest |  2.852 s | 0.0582 s | 0.0032 s |

Without prefixes

|                 Method |     Mean |    Error |   StdDev |
|----------------------- |---------:|---------:|---------:|
|           EnronDataSet | 47.295 s | 2.5500 s | 0.1398 s |
| EusesFormulasParseTest |  4.738 s | 0.3636 s | 0.0199 s |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prefixes to regex terminals #161

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Method	Mean	Error	StdDev
EnronDataSet	26.496 s	3.6721 s	0.2013 s
EusesFormulasParseTest	2.852 s	0.0582 s	0.0032 s

Method	Mean	Error	StdDev
EnronDataSet	47.295 s	2.5500 s	0.1398 s
EusesFormulasParseTest	4.738 s	0.3636 s	0.0199 s

Add prefixes to regex terminals #161

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions