Skip to content

Enable SourceLookup to leverage sequential stored fields reader#63035

Merged
cbuescher merged 6 commits intoelastic:masterfrom
cbuescher:issue-62621
Oct 6, 2020
Merged

Enable SourceLookup to leverage sequential stored fields reader#63035
cbuescher merged 6 commits intoelastic:masterfrom
cbuescher:issue-62621

Conversation

@cbuescher
Copy link
Copy Markdown
Member

This is a WIP draft of using similar optimizations as in #62509 for SourceLookup.
This PR is using the potentially better field reader from the fetch phase also in SourceLookup.
First rally exeriments show that this speeds up e.g. when runtime fields that are using "_source" are
added e.g. via "docvalue_fields" in the search hits. Using runtime fields that are using "_source" in
queries or aggs isn't affected so far.

@cbuescher cbuescher added the WIP label Sep 29, 2020
@cbuescher cbuescher changed the title Enable SourceLookup to leverage sequential stored fields reader WIP: Enable SourceLookup to leverage sequential stored fields reader Sep 29, 2020
@cbuescher
Copy link
Copy Markdown
Member Author

Using elastic/rally-tracks#133 the current state shows improvements when using runtime fields in "docvalue_fields" but not inside queries.

Metric Task Baseline Contender Diff Unit
Min Throughput runtime-field-source 1.00251 1.0048 0.00229 ops/s
Median Throughput runtime-field-source 1.00351 1.00634 0.00283 ops/s
Max Throughput runtime-field-source 1.00538 1.00953 0.00415 ops/s
50th percentile latency runtime-field-source 472.739 49.709 -423.03 ms
90th percentile latency runtime-field-source 502.269 53.5671 -448.702 ms
99th percentile latency runtime-field-source 547.761 58.8511 -488.91 ms
100th percentile latency runtime-field-source 600.303 63.7462 -536.556 ms
50th percentile service time runtime-field-source 468.671 47.5842 -421.087 ms
90th percentile service time runtime-field-source 497.901 51.1764 -446.725 ms
99th percentile service time runtime-field-source 544.124 57.8726 -486.251 ms
100th percentile service time runtime-field-source 595.761 61.0958 -534.665 ms
error rate runtime-field-source 0 0 0 %
Min Throughput search-term-runtime-field 1.00431 1.00408 -0.00023 ops/s
Median Throughput search-term-runtime-field 1.00574 1.00568 -6e-05 ops/s
Max Throughput search-term-runtime-field 1.00857 1.00856 -0 ops/s
50th percentile latency search-term-runtime-field 139.135 143.599 4.46463 ms
90th percentile latency search-term-runtime-field 145.089 151.245 6.15557 ms
99th percentile latency search-term-runtime-field 148.776 173.384 24.6078 ms
100th percentile latency search-term-runtime-field 149.473 211.474 62.0002 ms
50th percentile service time search-term-runtime-field 133.764 140.644 6.87971 ms
90th percentile service time search-term-runtime-field 138.313 147.066 8.75276 ms
99th percentile service time search-term-runtime-field 141.96 169.885 27.9255 ms
100th percentile service time search-term-runtime-field 142.458 206.577 64.1192 ms
error rate search-term-runtime-field 0 0 0 %

@cbuescher cbuescher force-pushed the issue-62621 branch 2 times, most recently from c49cd2c to 9d31ccf Compare September 30, 2020 09:27
@cbuescher
Copy link
Copy Markdown
Member Author

cbuescher commented Sep 30, 2020

Plugging the optimization in SourceLookup#setSegmentAndDocument like in 6924c53 now also shows significant speed up when using runtime fields in a query:

Baseline: master
Contender: 6924c53

Metric Task Baseline Contender Diff Unit
Min Throughput runtime-field-source 1.00226 1.00949 0.00723 ops/s
Median Throughput runtime-field-source 1.00571 1.01268 0.00696 ops/s
Max Throughput runtime-field-source 1.00995 1.019 0.00905 ops/s
50th percentile latency runtime-field-source 521.659 61.0327 -460.626 ms
90th percentile latency runtime-field-source 664.78 75.757 -589.023 ms
100th percentile latency runtime-field-source 837.594 107.107 -730.487 ms
50th percentile service time runtime-field-source 516.112 56.3236 -459.789 ms
90th percentile service time runtime-field-source 663.034 71.0063 -592.027 ms
100th percentile service time runtime-field-source 837.071 103.393 -733.678 ms
error rate runtime-field-source 0 0 0 %
Min Throughput search-term-runtime-field 1.00856 1.00955 0.00099 ops/s
Median Throughput search-term-runtime-field 1.01139 1.01265 0.00126 ops/s
Max Throughput search-term-runtime-field 1.01696 1.01882 0.00186 ops/s
50th percentile latency search-term-runtime-field 146.986 56.7459 -90.24 ms
90th percentile latency search-term-runtime-field 150.739 60.6424 -90.0961 ms
100th percentile latency search-term-runtime-field 161.373 77.2458 -84.1271 ms
50th percentile service time search-term-runtime-field 143.307 53.8385 -89.4683 ms
90th percentile service time search-term-runtime-field 147.191 57.467 -89.7237 ms
100th percentile service time search-term-runtime-field 156.786 72.6448 -84.1409 ms

@cbuescher
Copy link
Copy Markdown
Member Author

I also tested terms aggregation on a runtime field over a time slice of the http_log docs (elastic/rally-tracks@ee9e11b), this PR also shows noticable speedups:

Metric Task Baseline Contender Diff Unit
50th percentile latency agg-term-runtime-field 94.8504 17.3417 -77.5088 ms
90th percentile latency agg-term-runtime-field 100.651 19.2203 -81.4303 ms
100th percentile latency agg-term-runtime-field 181.195 34.0516 -147.143 ms
50th percentile service time agg-term-runtime-field 91.4234 13.6569 -77.7665 ms
90th percentile service time agg-term-runtime-field 97.3426 14.2393 -83.1033 ms
100th percentile service time agg-term-runtime-field 178.288 29.836 -148.452 ms

@cbuescher
Copy link
Copy Markdown
Member Author

@elasticmachine run elasticsearch-ci/packaging-sample-windows

@cbuescher cbuescher changed the title WIP: Enable SourceLookup to leverage sequential stored fields reader Enable SourceLookup to leverage sequential stored fields reader Oct 1, 2020
@cbuescher cbuescher marked this pull request as ready for review October 1, 2020 11:00
@cbuescher cbuescher added :Search/Search Search-related issues that do not fall into other categories >enhancement labels Oct 1, 2020
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 1, 2020
@cbuescher cbuescher removed the WIP label Oct 1, 2020
Copy link
Copy Markdown
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see that the pr improves queries and aggregations significantly. I left one comment regarding some changes that could be removed.

LeafReaderContext context,
int docId,
SourceLookup sourceLookup,
CheckedBiConsumer<Integer, FieldsVisitor, IOException> fieldReader,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this change now that SourceLookup leverages the sequential reader automatically ?

Christoph Büscher added 2 commits October 1, 2020 18:56
@cbuescher
Copy link
Copy Markdown
Member Author

@jimczi thanks for the review, I removed the left-overs from previous iterations

Copy link
Copy Markdown
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
It would be nice to know the effect of the change when using the runtime field in conjunction with a sparse indexed field but we can tackle this in the rally challenge that you're iterating on. So +1 to merge as is.

@cbuescher cbuescher merged commit 49f83cc into elastic:master Oct 6, 2020
@cbuescher
Copy link
Copy Markdown
Member Author

@jimczi thanks, my understanding is this should go to 7.10 as well?

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Oct 6, 2020
…tic#63035)

In elastic#62509 we already plugged faster sequential access for stored fields in the fetch phase.
This PR now adds using the potentially better field reader also in SourceLookup.
Rally exeriments are showing that this speeds up e.g. when runtime fields that are using
"_source" are added e.g. via "docvalue_fields" or are used in queries or aggs.

Closes elastic#62621
cbuescher pushed a commit that referenced this pull request Oct 6, 2020
…) (#63316)

In #62509 we already plugged faster sequential access for stored fields in the fetch phase.
This PR now adds using the potentially better field reader also in SourceLookup.
Rally exeriments are showing that this speeds up e.g. when runtime fields that are using
"_source" are added e.g. via "docvalue_fields" or are used in queries or aggs.

Closes #62621
@cbuescher
Copy link
Copy Markdown
Member Author

I went ahead mergin to 7.x with #63316, let me know if there are any concerns with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.10.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants