Use merging fieldsreader when restoring versionmap during recovery#66944
Merged
romseygeek merged 7 commits intoJan 6, 2021
Conversation
… map during recovery
Collaborator
|
Pinging @elastic/es-distributed (Team:Distributed) |
dnhatn
approved these changes
Jan 4, 2021
Member
dnhatn
left a comment
There was a problem hiding this comment.
LGTM. The change makes sense to me. Thanks, Alan.
jimczi
requested changes
Jan 5, 2021
Contributor
jimczi
left a comment
There was a problem hiding this comment.
I left one comment, the logic to access the merge instance seems wrong.
| } | ||
|
|
||
| private static CheckedBiConsumer<Integer, StoredFieldVisitor, IOException> getStoredFieldsReader(LeafReader in) { | ||
| if (in instanceof CodecReader) { |
Contributor
There was a problem hiding this comment.
We wrap the directory reader in
SequentialStoredFieldsLeafReader that exposes the merge instance lightly (see SourceLookup).That avoids rewriting all filter leaf readers as codec readers. And so in this case the LeafReaderWithLiveDocs should extend
SequentialStoredFieldsLeafReader to expose this functionality.
Contributor
Author
|
@elasticmachine update branch |
Contributor
Author
|
@elasticmachine run elasticsearch-ci/1 |
jimczi
approved these changes
Jan 5, 2021
Contributor
There was a problem hiding this comment.
An IllegalStateException seems more appropriate ?
41d4ec8 to
a87e67b
Compare
Contributor
Author
|
@elasticmachine update branch |
romseygeek
added a commit
that referenced
this pull request
Jan 6, 2021
…66944) When reading sequential data from compressed stored fields it is more efficient to use a CodecReader's merge instance, as this holds decompressed blocks in memory and avoids re-doing the decompression for every document. When restoring the version map during a recovery, we read id fields from all documents beyond the global checkpoint, and this is precisely the situation where this performance enhancement should help. This commit updates the IdOnlyFieldsVisitor to use a merge instance, and in the process tidies up the API a bit; we now have a package-private IdStoredFieldLoader that handles fetching merge instances and loading ids for each document.
dnhatn
added a commit
that referenced
this pull request
Jan 12, 2021
Reading operations in Lucene changes is likely sequential and more efficient with SequentialStoredFieldsLeafReader. Relates #66944
dnhatn
added a commit
that referenced
this pull request
Jan 12, 2021
Reading operations in Lucene changes is likely sequential and more efficient with SequentialStoredFieldsLeafReader. Relates #66944
dnhatn
added a commit
to dnhatn/elasticsearch
that referenced
this pull request
Jan 12, 2021
…67190) Reading operations in Lucene changes is likely sequential and more efficient with SequentialStoredFieldsLeafReader. Relates elastic#66944
dnhatn
added a commit
that referenced
this pull request
Jan 12, 2021
Reading operations in Lucene changes is likely sequential and more efficient with SequentialStoredFieldsLeafReader. Relates #66944
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When reading sequential data from compressed stored fields it is
more efficient to use a CodecReader's merge instance, as this holds
decompressed blocks in memory and avoids re-doing the
decompression for every document. When restoring the version map
during a recovery, we read id fields from all documents beyond the global
checkpoint, and this is precisely the situation where this performance
enhancement should help.
This commit updates the IdOnlyFieldsVisitor to use a merge instance,
and in the process tidies up the API a bit; we now have a package-private
IdStoredFieldLoader that handles fetching merge instances and loading
ids for each document.