Use merging fieldsreader when restoring versionmap during recovery by romseygeek · Pull Request #66944 · elastic/elasticsearch

romseygeek · 2021-01-04T17:21:02Z

When reading sequential data from compressed stored fields it is
more efficient to use a CodecReader's merge instance, as this holds
decompressed blocks in memory and avoids re-doing the
decompression for every document. When restoring the version map
during a recovery, we read id fields from all documents beyond the global
checkpoint, and this is precisely the situation where this performance
enhancement should help.

This commit updates the IdOnlyFieldsVisitor to use a merge instance,
and in the process tidies up the API a bit; we now have a package-private
IdStoredFieldLoader that handles fetching merge instances and loading
ids for each document.

… map during recovery

elasticmachine · 2021-01-04T17:21:05Z

Pinging @elastic/es-distributed (Team:Distributed)

dnhatn

LGTM. The change makes sense to me. Thanks, Alan.

jimczi

I left one comment, the logic to access the merge instance seems wrong.

jimczi · 2021-01-05T08:52:46Z

+    }
+
+    private static CheckedBiConsumer<Integer, StoredFieldVisitor, IOException> getStoredFieldsReader(LeafReader in) {
+        if (in instanceof CodecReader) {


We wrap the directory reader in

elasticsearch/server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

Line 273 in fa7bce3

restoreVersionMapAndCheckpointTracker(Lucene.wrapAllDocsLive(searcher.getDirectoryReader()));

so this is never true. We don't use codec readers, instead we introduced a SequentialStoredFieldsLeafReader that exposes the merge instance lightly (see SourceLookup).
That avoids rewriting all filter leaf readers as codec readers. And so in this case the LeafReaderWithLiveDocs should extend SequentialStoredFieldsLeafReader to expose this functionality.

Good catch! Thanks Jim!

romseygeek · 2021-01-05T11:32:08Z

@elasticmachine update branch

romseygeek · 2021-01-05T11:53:46Z

@elasticmachine run elasticsearch-ci/1

jimczi

LGTM, thanks

jimczi · 2021-01-05T11:58:00Z

An IllegalStateException seems more appropriate ?

…ieldsvisitor

romseygeek · 2021-01-06T09:23:39Z

@elasticmachine update branch

…66944) When reading sequential data from compressed stored fields it is more efficient to use a CodecReader's merge instance, as this holds decompressed blocks in memory and avoids re-doing the decompression for every document. When restoring the version map during a recovery, we read id fields from all documents beyond the global checkpoint, and this is precisely the situation where this performance enhancement should help. This commit updates the IdOnlyFieldsVisitor to use a merge instance, and in the process tidies up the API a bit; we now have a package-private IdStoredFieldLoader that handles fetching merge instances and loading ids for each document.

Reading operations in Lucene changes is likely sequential and more efficient with SequentialStoredFieldsLeafReader. Relates #66944

…67190) Reading operations in Lucene changes is likely sequential and more efficient with SequentialStoredFieldsLeafReader. Relates elastic#66944

Reading operations in Lucene changes is likely sequential and more efficient with SequentialStoredFieldsLeafReader. Relates #66944

Use merging fields reader instance to load ids when restoring version…

e390979

… map during recovery

romseygeek added >enhancement :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.12.0 labels Jan 4, 2021

romseygeek requested review from dnhatn and jimczi January 4, 2021 17:21

romseygeek self-assigned this Jan 4, 2021

elasticmachine added the Team:Distributed Meta label for distributed team. label Jan 4, 2021

dnhatn approved these changes Jan 4, 2021

View reviewed changes

jimczi requested changes Jan 5, 2021

View reviewed changes

Ensure we're actually using a sequential reader

b4367e9

jimczi approved these changes Jan 5, 2021

View reviewed changes

romseygeek added 2 commits January 5, 2021 14:19

Merge remote-tracking branch 'origin/master' into engine/checkpoint-f…

e302d6f

…ieldsvisitor

use IllegalStateException

a87e67b

romseygeek force-pushed the engine/checkpoint-fieldsvisitor branch from 41d4ec8 to a87e67b Compare January 5, 2021 14:20

romseygeek added 2 commits January 5, 2021 14:48

Merge remote-tracking branch 'origin/master' into engine/checkpoint-f…

e8d059d

…ieldsvisitor

Merge remote-tracking branch 'origin/master' into engine/checkpoint-f…

06ce3af

…ieldsvisitor

Merge branch 'master' into engine/checkpoint-fieldsvisitor

f72833f

romseygeek merged commit c5afa9a into elastic:master Jan 6, 2021

romseygeek deleted the engine/checkpoint-fieldsvisitor branch January 6, 2021 10:20

dnhatn mentioned this pull request Jan 7, 2021

Use SequentialStoredFieldsLeafReader in reading Lucene changes #67190

Merged

dnhatn added a commit that referenced this pull request Jan 12, 2021

Use SequentialStoredFieldsLeafReader to read Lucene changes (#67190)

6cbaaed

Reading operations in Lucene changes is likely sequential and more efficient with SequentialStoredFieldsLeafReader. Relates #66944

dnhatn added a commit that referenced this pull request Jan 12, 2021

Use SequentialStoredFieldsLeafReader to read Lucene changes (#67190)

5fe0d67

Reading operations in Lucene changes is likely sequential and more efficient with SequentialStoredFieldsLeafReader. Relates #66944

dnhatn mentioned this pull request Jan 12, 2021

Use SequentialStoredFieldsLeafReader to read Lucene changes #67372

Merged

dnhatn added a commit that referenced this pull request Jan 12, 2021

Use SequentialStoredFieldsLeafReader to read Lucene changes (#67190)

dafe801

Reading operations in Lucene changes is likely sequential and more efficient with SequentialStoredFieldsLeafReader. Relates #66944

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use merging fieldsreader when restoring versionmap during recovery#66944

Use merging fieldsreader when restoring versionmap during recovery#66944
romseygeek merged 7 commits into
elastic:masterfrom
romseygeek:engine/checkpoint-fieldsvisitor

romseygeek commented Jan 4, 2021

Uh oh!

elasticmachine commented Jan 4, 2021

Uh oh!

dnhatn left a comment

Uh oh!

jimczi left a comment

Uh oh!

jimczi Jan 5, 2021 •

edited

Loading

Uh oh!

dnhatn Jan 5, 2021

Uh oh!

romseygeek commented Jan 5, 2021

Uh oh!

romseygeek commented Jan 5, 2021

Uh oh!

jimczi left a comment

Uh oh!

jimczi Jan 5, 2021

Uh oh!

romseygeek commented Jan 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

romseygeek commented Jan 4, 2021

Uh oh!

elasticmachine commented Jan 4, 2021

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

jimczi Jan 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

romseygeek commented Jan 5, 2021

Uh oh!

romseygeek commented Jan 5, 2021

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

jimczi Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

romseygeek commented Jan 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jimczi Jan 5, 2021 •

edited

Loading