Skip to content

Faster _source lookups.#61806

Closed
jpountz wants to merge 1 commit intoelastic:masterfrom
jpountz:faster_source_lookups
Closed

Faster _source lookups.#61806
jpountz wants to merge 1 commit intoelastic:masterfrom
jpountz:faster_source_lookups

Conversation

@jpountz
Copy link
Copy Markdown
Contributor

@jpountz jpountz commented Sep 1, 2020

As we're making it easier to create fields that read information
dynamically from _source, I've been trying to think about ways that we
could make stored field access less terrible. Today, Lucene's stored
fields optimize for random access: if your use-case is to fetch stored
fields for your top hits, it's very unlikely that two of them will come
from the same block, so Lucene makes no effort to keep state in order to
not decompress the same data multiple times because two documents might
be in the same compressed block. It only does this for merges, which is
the only time when stored fields are expected to be accessed
sequentially. So this PR introduces a little hack that uses the merging
logic for source lookups.

Note that the optimization in this PR doesn't apply if DLS or FLS are
enabled, as these features introduce LeafReader wrappers that would hide
the CodecReader.

FYI the speedup is not trivial. I played with a nginx.access dataset
and fetching 10M docs sequentially goes down from 8+ minutes to 1 minute
and 40 seconds.

As we're making it easier to create fields that read information
dynamically from `_source`, I've been trying to think about ways that we
could make stored field access less terrible. Today, Lucene's stored
fields optimize for random access: if your use-case is to fetch stored
fields for your top hits, it's very unlikely that two of them will come
from the same block, so Lucene makes no effort to keep state in order to
not decompress the same data multiple times because two documents might
be in the same compressed block. It only does this for merges, which is
the only time when stored fields are expected to be accessed
sequentially. So this PR introduces a little hack that uses the merging
logic for source lookups.

Note that the optimization in this PR doesn't apply if DLS or FLS are
enabled, as these features introduce LeafReader wrappers that would hide
the CodecReader.

FYI the speedup is not trivial. I played with a `nginx.access` dataset
and fetching 10M docs sequentially goes down from 8+ minutes to 1 minute
and 40 seconds.
// little hack here and pretend we're going to do merges in order to
// get better sequential access.
try {
this.storedFieldsReader = SlowCodecReaderWrapper.wrap(reader).getFieldsReader().getMergeInstance();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we do our best to always have a CodecReader here now so maybe we should make that the return type? Or assert that it is one and then hard cast? It'd be a shame to fall back to the "slow" implementation of this a run time accidentally.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there requires more work. DLS and FLS currently use reader wrappers that would hide the codec reader.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave a comment then about how we're not using the "slow part" of SlowCodecReaderWrapper. I checked the code path a little more closely and I feel much better it. The name is just scary. Really this just wraps the reader so you can call getMergeInstance which still returns this.

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Sep 17, 2020
Spinoff of elastic#61806
Today retrieving stored fields at search time is optimized for random access.
So we make no effort to keep state in order to not decompress the same data
multiple times because two documents might be in the same compressed block.
This strategy is acceptable when retrieving a top N sorted by score since
there is no guarantee that documents will be on the same block.
However, we have some use cases where the document to retrieve might be
completely sequential:
* Scrolls or normal search sorted by document id.
* Queries on Runtime fields that extract from _source.

This commit allows to expose all the custom readers that we use at search time
as codec readers in order to be able to leverage the merge instances of
stored fields readers that are optimized for sequential access.
This change focuses on the fetch phase for now and leverages the merge instances
for stored fields only if all documents to retrieve are adjacent.
Applying the same logic in the source lookup of runtime fields should
be trivial but will be done in a follow up.

The speedup on queries sorted by doc id is significant.
I played with the scroll task of the [http_logs rally track](https://elasticsearch-benchmarks.elastic.co/#tracks/http-logs/nightly/default/30d)
on my laptop and had the following result:
```
|                                                        Metric |   Task |    Baseline |   Contender |     Diff |    Unit |
|--------------------------------------------------------------:|-------:|------------:|------------:|---------:|--------:|
|                                            Total Young Gen GC |        |       0.199 |       0.231 |    0.032 |       s |
|                                              Total Old Gen GC |        |           0 |           0 |        0 |       s |
|                                                    Store size |        |     17.9704 |     17.9704 |        0 |      GB |
|                                                 Translog size |        | 2.04891e-06 | 2.04891e-06 |        0 |      GB |
|                                        Heap used for segments |        |    0.820332 |    0.820332 |        0 |      MB |
|                                      Heap used for doc values |        |    0.113979 |    0.113979 |        0 |      MB |
|                                           Heap used for terms |        |     0.37973 |     0.37973 |        0 |      MB |
|                                           Heap used for norms |        |     0.03302 |     0.03302 |        0 |      MB |
|                                          Heap used for points |        |           0 |           0 |        0 |      MB |
|                                   Heap used for stored fields |        |    0.293602 |    0.293602 |        0 |      MB |
|                                                 Segment count |        |         541 |         541 |        0 |         |
|                                                Min Throughput | scroll |     12.7872 |     12.8747 |  0.08758 | pages/s |
|                                             Median Throughput | scroll |     12.9679 |     13.0556 |  0.08776 | pages/s |
|                                                Max Throughput | scroll |     13.4001 |     13.5705 |  0.17046 | pages/s |
|                                       50th percentile latency | scroll |     524.966 |     251.396 |  -273.57 |      ms |
|                                       90th percentile latency | scroll |     577.593 |     271.066 | -306.527 |      ms |
|                                      100th percentile latency | scroll |      664.73 |     272.734 | -391.997 |      ms |
|                                  50th percentile service time | scroll |     522.387 |     248.776 | -273.612 |      ms |
|                                  90th percentile service time | scroll |     573.118 |      267.79 | -305.328 |      ms |
|                                 100th percentile service time | scroll |     660.642 |     268.963 | -391.678 |      ms |
|                                                    error rate | scroll |           0 |           0 |        0 |       % |
```

Closes elastic#62024
jimczi added a commit that referenced this pull request Sep 17, 2020
Faster sequential access for stored fields

Spinoff of #61806
Today retrieving stored fields at search time is optimized for random access.
So we make no effort to keep state in order to not decompress the same data
multiple times because two documents might be in the same compressed block.
This strategy is acceptable when retrieving a top N sorted by score since
there is no guarantee that documents will be on the same block.
However, we have some use cases where the document to retrieve might be
completely sequential:

Scrolls or normal search sorted by document id.
Queries on Runtime fields that extract from _source.
This commit exposes a sequential stored fields reader in the
custom leaf reader that we use at search time.
That allows to leverage the merge instances of stored fields readers that
are optimized for sequential access.
This change focuses on the fetch phase for now and leverages the merge instances
for stored fields only if all documents to retrieve are adjacent.
Applying the same logic in the source lookup of runtime fields should
be trivial but will be done in a follow up.

The speedup on queries sorted by doc id is significant.
I played with the scroll task of the http_logs rally track
on my laptop and had the following result:

|                                                        Metric |   Task |    Baseline |   Contender |     Diff |    Unit |
|--------------------------------------------------------------:|-------:|------------:|------------:|---------:|--------:|
|                                            Total Young Gen GC |        |       0.199 |       0.231 |    0.032 |       s |
|                                              Total Old Gen GC |        |           0 |           0 |        0 |       s |
|                                                    Store size |        |     17.9704 |     17.9704 |        0 |      GB |
|                                                 Translog size |        | 2.04891e-06 | 2.04891e-06 |        0 |      GB |
|                                        Heap used for segments |        |    0.820332 |    0.820332 |        0 |      MB |
|                                      Heap used for doc values |        |    0.113979 |    0.113979 |        0 |      MB |
|                                           Heap used for terms |        |     0.37973 |     0.37973 |        0 |      MB |
|                                           Heap used for norms |        |     0.03302 |     0.03302 |        0 |      MB |
|                                          Heap used for points |        |           0 |           0 |        0 |      MB |
|                                   Heap used for stored fields |        |    0.293602 |    0.293602 |        0 |      MB |
|                                                 Segment count |        |         541 |         541 |        0 |         |
|                                                Min Throughput | scroll |     12.7872 |     12.8747 |  0.08758 | pages/s |
|                                             Median Throughput | scroll |     12.9679 |     13.0556 |  0.08776 | pages/s |
|                                                Max Throughput | scroll |     13.4001 |     13.5705 |  0.17046 | pages/s |
|                                       50th percentile latency | scroll |     524.966 |     251.396 |  -273.57 |      ms |
|                                       90th percentile latency | scroll |     577.593 |     271.066 | -306.527 |      ms |
|                                      100th percentile latency | scroll |      664.73 |     272.734 | -391.997 |      ms |
|                                  50th percentile service time | scroll |     522.387 |     248.776 | -273.612 |      ms |
|                                  90th percentile service time | scroll |     573.118 |      267.79 | -305.328 |      ms |
|                                 100th percentile service time | scroll |     660.642 |     268.963 | -391.678 |      ms |
|                                                    error rate | scroll |           0 |           0 |        0 |       % |
Closes #62024
jimczi added a commit that referenced this pull request Sep 17, 2020
Faster sequential access for stored fields

Spinoff of #61806
Today retrieving stored fields at search time is optimized for random access.
So we make no effort to keep state in order to not decompress the same data
multiple times because two documents might be in the same compressed block.
This strategy is acceptable when retrieving a top N sorted by score since
there is no guarantee that documents will be on the same block.
However, we have some use cases where the document to retrieve might be
completely sequential:

Scrolls or normal search sorted by document id.
Queries on Runtime fields that extract from _source.
This commit exposes a sequential stored fields reader in the
custom leaf reader that we use at search time.
That allows to leverage the merge instances of stored fields readers that
are optimized for sequential access.
This change focuses on the fetch phase for now and leverages the merge instances
for stored fields only if all documents to retrieve are adjacent.
Applying the same logic in the source lookup of runtime fields should
be trivial but will be done in a follow up.

The speedup on queries sorted by doc id is significant.
I played with the scroll task of the http_logs rally track
on my laptop and had the following result:

|                                                        Metric |   Task |    Baseline |   Contender |     Diff |    Unit |
|--------------------------------------------------------------:|-------:|------------:|------------:|---------:|--------:|
|                                            Total Young Gen GC |        |       0.199 |       0.231 |    0.032 |       s |
|                                              Total Old Gen GC |        |           0 |           0 |        0 |       s |
|                                                    Store size |        |     17.9704 |     17.9704 |        0 |      GB |
|                                                 Translog size |        | 2.04891e-06 | 2.04891e-06 |        0 |      GB |
|                                        Heap used for segments |        |    0.820332 |    0.820332 |        0 |      MB |
|                                      Heap used for doc values |        |    0.113979 |    0.113979 |        0 |      MB |
|                                           Heap used for terms |        |     0.37973 |     0.37973 |        0 |      MB |
|                                           Heap used for norms |        |     0.03302 |     0.03302 |        0 |      MB |
|                                          Heap used for points |        |           0 |           0 |        0 |      MB |
|                                   Heap used for stored fields |        |    0.293602 |    0.293602 |        0 |      MB |
|                                                 Segment count |        |         541 |         541 |        0 |         |
|                                                Min Throughput | scroll |     12.7872 |     12.8747 |  0.08758 | pages/s |
|                                             Median Throughput | scroll |     12.9679 |     13.0556 |  0.08776 | pages/s |
|                                                Max Throughput | scroll |     13.4001 |     13.5705 |  0.17046 | pages/s |
|                                       50th percentile latency | scroll |     524.966 |     251.396 |  -273.57 |      ms |
|                                       90th percentile latency | scroll |     577.593 |     271.066 | -306.527 |      ms |
|                                      100th percentile latency | scroll |      664.73 |     272.734 | -391.997 |      ms |
|                                  50th percentile service time | scroll |     522.387 |     248.776 | -273.612 |      ms |
|                                  90th percentile service time | scroll |     573.118 |      267.79 | -305.328 |      ms |
|                                 100th percentile service time | scroll |     660.642 |     268.963 | -391.678 |      ms |
|                                                    error rate | scroll |           0 |           0 |        0 |       % |
Closes #62024
@jimczi
Copy link
Copy Markdown
Contributor

jimczi commented Sep 18, 2020

Superseded by #62509, hence closing.

@jimczi jimczi closed this Sep 18, 2020
@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Sep 18, 2020

@jimczi I think we still need to make changes to SourceLookup?

@jimczi
Copy link
Copy Markdown
Contributor

jimczi commented Sep 18, 2020

yep but I prefer that we open a new issue or pr since this one is outdated ?

@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Sep 18, 2020

Sounds good, I mosty wanted to make sure we knew there was still some work to do to improve _source lookups from scripts.

@jimczi
Copy link
Copy Markdown
Contributor

jimczi commented Sep 18, 2020

I opened #62621 for the source lookup in scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants