Skip to content

[Performance] Improve performance for date field parsing #8361

@mgodwan

Description

@mgodwan

Is your feature request related to a problem? Please describe.
OpenSearch relies on java.time.format.DateTimeFormatter for parsing various date time formats. This provides flexibility to support a plethora of use cases with multiple date-time formats.
For most of the use cases, users generally have common date-time formats for which JDK formatters can be slow. We can utilize the knowledge of underlying format and provide better latencies/throughput for document/query parsing with code customized for the underlying format. A lot of logging libraries (e.g. log4j) provide common formats for datetime which we can see to support to start with.

Describe the solution you'd like
Faster parsing alternatives for known formats, which can yield better times for overall indexing.

Describe alternatives you've considered

A barebones POC code for format yyyy-MM-dd HH:mm:ss: mgodwan@4345a75

The JMH micro-benchmarks run with this format using JDK Pattern formatter vs custom formatter show following results:

Benchmark Mode Cnt Score Error Units
DocumentParsingBenchmark.baseline avgt 9 381.958 +/-11.090 ns/op
DocumentParsingBenchmark.candidate avgt 9 21.262 +/-2.543 ns/op

Metadata

Metadata

Labels

IndexingIndexing, Bulk Indexing and anything related to indexingenhancementEnhancement or improvement to existing feature or requestv2.12.0Issues and PRs related to version 2.12.0v3.0.0Issues and PRs related to version 3.0.0

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions