Faster sequential access for stored fields by jimczi · Pull Request #62509 · elastic/elasticsearch

jimczi · 2020-09-16T22:33:40Z

Faster sequential access for stored fields

Spinoff of #61806
Today retrieving stored fields at search time is optimized for random access.
So we make no effort to keep state in order to not decompress the same data
multiple times because two documents might be in the same compressed block.
This strategy is acceptable when retrieving a top N sorted by score since
there is no guarantee that documents will be on the same block.
However, we have some use cases where the document to retrieve might be
completely sequential:

Scrolls or normal search sorted by document id.
Queries on Runtime fields that extract from _source.

This commit exposes a sequential stored fields reader in the
custom leaf reader that we use at search time.
That allows to leverage the merge instances of stored fields readers that
are optimized for sequential access.
This change focuses on the fetch phase for now and leverages the merge instances
for stored fields only if all documents to retrieve are adjacent.
Applying the same logic in the source lookup of runtime fields should
be trivial but will be done in a follow up.

The speedup on queries sorted by doc id is significant.
I played with the scroll task of the http_logs rally track
on my laptop and had the following result:

|                                                        Metric |   Task |    Baseline |   Contender |     Diff |    Unit |
|--------------------------------------------------------------:|-------:|------------:|------------:|---------:|--------:|
|                                            Total Young Gen GC |        |       0.199 |       0.231 |    0.032 |       s |
|                                              Total Old Gen GC |        |           0 |           0 |        0 |       s |
|                                                    Store size |        |     17.9704 |     17.9704 |        0 |      GB |
|                                                 Translog size |        | 2.04891e-06 | 2.04891e-06 |        0 |      GB |
|                                        Heap used for segments |        |    0.820332 |    0.820332 |        0 |      MB |
|                                      Heap used for doc values |        |    0.113979 |    0.113979 |        0 |      MB |
|                                           Heap used for terms |        |     0.37973 |     0.37973 |        0 |      MB |
|                                           Heap used for norms |        |     0.03302 |     0.03302 |        0 |      MB |
|                                          Heap used for points |        |           0 |           0 |        0 |      MB |
|                                   Heap used for stored fields |        |    0.293602 |    0.293602 |        0 |      MB |
|                                                 Segment count |        |         541 |         541 |        0 |         |
|                                                Min Throughput | scroll |     12.7872 |     12.8747 |  0.08758 | pages/s |
|                                             Median Throughput | scroll |     12.9679 |     13.0556 |  0.08776 | pages/s |
|                                                Max Throughput | scroll |     13.4001 |     13.5705 |  0.17046 | pages/s |
|                                       50th percentile latency | scroll |     524.966 |     251.396 |  -273.57 |      ms |
|                                       90th percentile latency | scroll |     577.593 |     271.066 | -306.527 |      ms |
|                                      100th percentile latency | scroll |      664.73 |     272.734 | -391.997 |      ms |
|                                  50th percentile service time | scroll |     522.387 |     248.776 | -273.612 |      ms |
|                                  90th percentile service time | scroll |     573.118 |      267.79 | -305.328 |      ms |
|                                 100th percentile service time | scroll |     660.642 |     268.963 | -391.678 |      ms |
|                                                    error rate | scroll |           0 |           0 |        0 |       % |

Closes #62024

elasticmachine · 2020-09-16T22:33:42Z

Pinging @elastic/es-search (:Search/Search)

jpountz

Woohoo! The change makes sense to me overall although I'm not completely happy with cutting over entirely to CodecReader. I wonder if we should instead introduce a new subclass of LeafReader that introduces a new method such as getSequentialStoredFieldsReader and make sure all our security/exitable leaf readers implement it (I haven't fully thought through the implications).

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

Spinoff of elastic#61806 Today retrieving stored fields at search time is optimized for random access. So we make no effort to keep state in order to not decompress the same data multiple times because two documents might be in the same compressed block. This strategy is acceptable when retrieving a top N sorted by score since there is no guarantee that documents will be on the same block. However, we have some use cases where the document to retrieve might be completely sequential: * Scrolls or normal search sorted by document id. * Queries on Runtime fields that extract from _source. This commit allows to expose all the custom readers that we use at search time as codec readers in order to be able to leverage the merge instances of stored fields readers that are optimized for sequential access. This change focuses on the fetch phase for now and leverages the merge instances for stored fields only if all documents to retrieve are adjacent. Applying the same logic in the source lookup of runtime fields should be trivial but will be done in a follow up. The speedup on queries sorted by doc id is significant. I played with the scroll task of the [http_logs rally track](https://elasticsearch-benchmarks.elastic.co/#tracks/http-logs/nightly/default/30d) on my laptop and had the following result: ``` | Metric | Task | Baseline | Contender | Diff | Unit | |--------------------------------------------------------------:|-------:|------------:|------------:|---------:|--------:| | Total Young Gen GC | | 0.199 | 0.231 | 0.032 | s | | Total Old Gen GC | | 0 | 0 | 0 | s | | Store size | | 17.9704 | 17.9704 | 0 | GB | | Translog size | | 2.04891e-06 | 2.04891e-06 | 0 | GB | | Heap used for segments | | 0.820332 | 0.820332 | 0 | MB | | Heap used for doc values | | 0.113979 | 0.113979 | 0 | MB | | Heap used for terms | | 0.37973 | 0.37973 | 0 | MB | | Heap used for norms | | 0.03302 | 0.03302 | 0 | MB | | Heap used for points | | 0 | 0 | 0 | MB | | Heap used for stored fields | | 0.293602 | 0.293602 | 0 | MB | | Segment count | | 541 | 541 | 0 | | | Min Throughput | scroll | 12.7872 | 12.8747 | 0.08758 | pages/s | | Median Throughput | scroll | 12.9679 | 13.0556 | 0.08776 | pages/s | | Max Throughput | scroll | 13.4001 | 13.5705 | 0.17046 | pages/s | | 50th percentile latency | scroll | 524.966 | 251.396 | -273.57 | ms | | 90th percentile latency | scroll | 577.593 | 271.066 | -306.527 | ms | | 100th percentile latency | scroll | 664.73 | 272.734 | -391.997 | ms | | 50th percentile service time | scroll | 522.387 | 248.776 | -273.612 | ms | | 90th percentile service time | scroll | 573.118 | 267.79 | -305.328 | ms | | 100th percentile service time | scroll | 660.642 | 268.963 | -391.678 | ms | | error rate | scroll | 0 | 0 | 0 | % | ``` Closes elastic#62024

jimczi · 2020-09-17T13:49:07Z

@jpountz , I modified this PR with your idea of having an abstract filter leaf reader that exposes getSequentialStoredFieldsReader. The change is much smaller now, can you take another look ?

jpountz

I like it, I find it simpler now.

...er/src/main/java/org/elasticsearch/common/lucene/index/SequentialStoredFieldsLeafReader.java

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

server/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

...ava/org/elasticsearch/xpack/core/security/authz/accesscontrol/DocumentSubsetReaderTests.java

Faster sequential access for stored fields Spinoff of #61806 Today retrieving stored fields at search time is optimized for random access. So we make no effort to keep state in order to not decompress the same data multiple times because two documents might be in the same compressed block. This strategy is acceptable when retrieving a top N sorted by score since there is no guarantee that documents will be on the same block. However, we have some use cases where the document to retrieve might be completely sequential: Scrolls or normal search sorted by document id. Queries on Runtime fields that extract from _source. This commit exposes a sequential stored fields reader in the custom leaf reader that we use at search time. That allows to leverage the merge instances of stored fields readers that are optimized for sequential access. This change focuses on the fetch phase for now and leverages the merge instances for stored fields only if all documents to retrieve are adjacent. Applying the same logic in the source lookup of runtime fields should be trivial but will be done in a follow up. The speedup on queries sorted by doc id is significant. I played with the scroll task of the http_logs rally track on my laptop and had the following result: | Metric | Task | Baseline | Contender | Diff | Unit | |--------------------------------------------------------------:|-------:|------------:|------------:|---------:|--------:| | Total Young Gen GC | | 0.199 | 0.231 | 0.032 | s | | Total Old Gen GC | | 0 | 0 | 0 | s | | Store size | | 17.9704 | 17.9704 | 0 | GB | | Translog size | | 2.04891e-06 | 2.04891e-06 | 0 | GB | | Heap used for segments | | 0.820332 | 0.820332 | 0 | MB | | Heap used for doc values | | 0.113979 | 0.113979 | 0 | MB | | Heap used for terms | | 0.37973 | 0.37973 | 0 | MB | | Heap used for norms | | 0.03302 | 0.03302 | 0 | MB | | Heap used for points | | 0 | 0 | 0 | MB | | Heap used for stored fields | | 0.293602 | 0.293602 | 0 | MB | | Segment count | | 541 | 541 | 0 | | | Min Throughput | scroll | 12.7872 | 12.8747 | 0.08758 | pages/s | | Median Throughput | scroll | 12.9679 | 13.0556 | 0.08776 | pages/s | | Max Throughput | scroll | 13.4001 | 13.5705 | 0.17046 | pages/s | | 50th percentile latency | scroll | 524.966 | 251.396 | -273.57 | ms | | 90th percentile latency | scroll | 577.593 | 271.066 | -306.527 | ms | | 100th percentile latency | scroll | 664.73 | 272.734 | -391.997 | ms | | 50th percentile service time | scroll | 522.387 | 248.776 | -273.612 | ms | | 90th percentile service time | scroll | 573.118 | 267.79 | -305.328 | ms | | 100th percentile service time | scroll | 660.642 | 268.963 | -391.678 | ms | | error rate | scroll | 0 | 0 | 0 | % | Closes #62024

In #62509 we already plugged faster sequential access for stored fields in the fetch phase. This PR now adds using the potentially better field reader also in SourceLookup. Rally exeriments are showing that this speeds up e.g. when runtime fields that are using "_source" are added e.g. via "docvalue_fields" or are used in queries or aggs. Closes #62621

…tic#63035) In elastic#62509 we already plugged faster sequential access for stored fields in the fetch phase. This PR now adds using the potentially better field reader also in SourceLookup. Rally exeriments are showing that this speeds up e.g. when runtime fields that are using "_source" are added e.g. via "docvalue_fields" or are used in queries or aggs. Closes elastic#62621

…) (#63316) In #62509 we already plugged faster sequential access for stored fields in the fetch phase. This PR now adds using the potentially better field reader also in SourceLookup. Rally exeriments are showing that this speeds up e.g. when runtime fields that are using "_source" are added e.g. via "docvalue_fields" or are used in queries or aggs. Closes #62621

jimczi added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.10.0 labels Sep 16, 2020

elasticmachine added the Team:Search Meta label for search team label Sep 16, 2020

jimczi force-pushed the enhancements/search_codec_reader branch from b174460 to 7ce055b Compare September 17, 2020 00:04

jpountz reviewed Sep 17, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java Outdated Show resolved Hide resolved

jimczi force-pushed the enhancements/search_codec_reader branch from 3377e71 to 6ed45fa Compare September 17, 2020 13:46

fix SeqIdGeneratingFilterReader

c06020d

jpountz approved these changes Sep 17, 2020

View reviewed changes

jimczi added 2 commits September 17, 2020 17:04

feedback

9d3f33f

unused import

2bee5fc

jimczi merged commit 6784c4d into elastic:master Sep 17, 2020

jimczi deleted the enhancements/search_codec_reader branch September 17, 2020 16:46

jimczi mentioned this pull request Sep 17, 2020

Faster sequential access for stored fields (#62509) #62573

Merged

This was referenced Sep 18, 2020

Faster _source lookups. #61806

Closed

SourceLookup should leverage sequential stored fields reader #62621

Closed

cbuescher mentioned this pull request Sep 29, 2020

Enable SourceLookup to leverage sequential stored fields reader #63035

Merged

cbuescher mentioned this pull request Oct 6, 2020

Enable SourceLookup to leverage sequential stored fields reader (#63035) #63316

Merged

mkleen mentioned this pull request Dec 4, 2020

Performance regression for normal select with order by crate/crate#10816

Closed

mkleen mentioned this pull request Dec 15, 2020

Optimize fetch performance for sequential value retrieval crate/crate#10862

Merged

5 tasks

dnhatn mentioned this pull request Jan 14, 2021

Use SequentialStoredFieldsLeafReader in reading Lucene changes #67190

Merged

nik9000 mentioned this pull request May 5, 2021

More debugging info for significant_text #72727

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

bowenlan-amzn mentioned this pull request Mar 19, 2026

[BUG] madvise from stored field query path could cause mmap lock contention on kernel 5.10 opensearch-project/OpenSearch#20933

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster sequential access for stored fields#62509

Faster sequential access for stored fields#62509
jimczi merged 4 commits intoelastic:masterfrom
jimczi:enhancements/search_codec_reader

jimczi commented Sep 16, 2020 •

edited

Loading

Uh oh!

elasticmachine commented Sep 16, 2020

Uh oh!

jpountz left a comment

Uh oh!

Uh oh!

jimczi commented Sep 17, 2020

Uh oh!

jpountz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jimczi commented Sep 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Sep 16, 2020

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jimczi commented Sep 17, 2020

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jimczi commented Sep 16, 2020 •

edited

Loading