Skip to content

Ingest DateProcessor (small) speedup, optimize collections code in DateFormatter.forPattern#91521

Merged
joegallo merged 6 commits intoelastic:mainfrom
joegallo:ingest-date-processor-speedup
Nov 14, 2022
Merged

Ingest DateProcessor (small) speedup, optimize collections code in DateFormatter.forPattern#91521
joegallo merged 6 commits intoelastic:mainfrom
joegallo:ingest-date-processor-speedup

Conversation

@joegallo
Copy link
Copy Markdown
Contributor

@joegallo joegallo commented Nov 10, 2022

  • Changes the return type of splitCombinedPatterns to avoid an array->collection conversion
  • Direct for-each iteration over a collection in forPattern rather than the streams approach

It's a relatively small change to the code, but since we run this each time a DateProcessor executes on an IngestDocument, it's a pretty hot code path. In my little microbenchmark it cuts ~20% of the time spent in a DateProcessor, or about 1% of the ingest time overall (but again for emphasis, that's just my little microbenchmark).

Here's the before/after flamegraph:

Screen Shot 2022-11-10 at 2 48 13 PM

Screen Shot 2022-11-10 at 2 49 20 PM

to match how it's done on the previous ternary for timezone
to keep with the return type from delimitedListToStringArray, rather
than wrapping it
This is still pretty simple as a for each, and the for each is faster.
@joegallo joegallo added >non-issue :Distributed/Ingest Node Execution or management of Ingest Pipelines v8.6.0 labels Nov 10, 2022
@joegallo joegallo requested a review from grcevski November 10, 2022 20:00
@elasticsearchmachine elasticsearchmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Nov 10, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Copy Markdown
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I would just add a comment on line 116 in forPattern to say it's performance sensitive code ;), just so that we don't undo it by a refactor in the future.

@joegallo joegallo merged commit 0ee5c76 into elastic:main Nov 14, 2022
@joegallo joegallo deleted the ingest-date-processor-speedup branch November 14, 2022 17:58
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Nov 15, 2022
* main: (163 commits)
  [DOCS] Edits frequent items aggregation (elastic#91564)
  Handle providers of optional services in ubermodule classloader (elastic#91217)
  Add `exportDockerImages` lifecycle task for exporting docker tarballs (elastic#91571)
  Fix CSV dependency report output file location in DRA CI job
  Fix variable placeholder for Strings.format calls (elastic#91531)
  Fix output dir creation in ConcatFileTask (elastic#91568)
  Fix declaration of dependencies in DRA snapshots CI job (elastic#91569)
  Upgrade Gradle Enterprise plugin to 3.11.4 (elastic#91435)
  Ingest DateProcessor (small) speedup, optimize collections code in DateFormatter.forPattern (elastic#91521)
  Fix inter project handling of generateDependenciesReport (elastic#91555)
  [Synthetics] Add synthetics-* read to fleet-server (elastic#91391)
  [ML] Copy more settings when creating DF analytics destination index (elastic#91546)
  Reduce CartesianCentroidIT flakiness (elastic#91553)
  Propagate last node to reinitialized routing tables (elastic#91549)
  Forecast write load during rollovers (elastic#91425)
  [DOCS] Warn about potential overhead of named queries (elastic#91512)
  Datastream unavailable exception metadata (elastic#91461)
  Generate docker images and dependency report in DRA ci job (elastic#91545)
  Support cartesian_bounds aggregation on point and shape (elastic#91298)
  Add support for EQL samples queries (elastic#91312)
  ...

# Conflicts:
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardIndexer.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Ingest Node Execution or management of Ingest Pipelines >non-issue Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants