-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is your feature request related to a problem? Please describe.
OpenSearch relies on java.time.format.DateTimeFormatter for parsing various date time formats. This provides flexibility to support a plethora of use cases with multiple date-time formats.
For most of the use cases, users generally have common date-time formats for which JDK formatters can be slow. We can utilize the knowledge of underlying format and provide better latencies/throughput for document/query parsing with code customized for the underlying format. A lot of logging libraries (e.g. log4j) provide common formats for datetime which we can see to support to start with.
Describe the solution you'd like
Faster parsing alternatives for known formats, which can yield better times for overall indexing.
Describe alternatives you've considered
A barebones POC code for format yyyy-MM-dd HH:mm:ss: mgodwan@4345a75
The JMH micro-benchmarks run with this format using JDK Pattern formatter vs custom formatter show following results:
| Benchmark | Mode | Cnt | Score | Error | Units |
|---|---|---|---|---|---|
| DocumentParsingBenchmark.baseline | avgt | 9 | 381.958 | +/-11.090 | ns/op |
| DocumentParsingBenchmark.candidate | avgt | 9 | 21.262 | +/-2.543 | ns/op |