Skip to content

Use merging fieldsreader when restoring versionmap during recovery#66944

Merged
romseygeek merged 7 commits into
elastic:masterfrom
romseygeek:engine/checkpoint-fieldsvisitor
Jan 6, 2021
Merged

Use merging fieldsreader when restoring versionmap during recovery#66944
romseygeek merged 7 commits into
elastic:masterfrom
romseygeek:engine/checkpoint-fieldsvisitor

Conversation

@romseygeek
Copy link
Copy Markdown
Contributor

When reading sequential data from compressed stored fields it is
more efficient to use a CodecReader's merge instance, as this holds
decompressed blocks in memory and avoids re-doing the
decompression for every document. When restoring the version map
during a recovery, we read id fields from all documents beyond the global
checkpoint, and this is precisely the situation where this performance
enhancement should help.

This commit updates the IdOnlyFieldsVisitor to use a merge instance,
and in the process tidies up the API a bit; we now have a package-private
IdStoredFieldLoader that handles fetching merge instances and loading
ids for each document.

@romseygeek romseygeek added >enhancement :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.12.0 labels Jan 4, 2021
@romseygeek romseygeek requested review from dnhatn and jimczi January 4, 2021 17:21
@romseygeek romseygeek self-assigned this Jan 4, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Jan 4, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The change makes sense to me. Thanks, Alan.

Copy link
Copy Markdown
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment, the logic to access the merge instance seems wrong.

}

private static CheckedBiConsumer<Integer, StoredFieldVisitor, IOException> getStoredFieldsReader(LeafReader in) {
if (in instanceof CodecReader) {
Copy link
Copy Markdown
Contributor

@jimczi jimczi Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wrap the directory reader in

restoreVersionMapAndCheckpointTracker(Lucene.wrapAllDocsLive(searcher.getDirectoryReader()));
so this is never true. We don't use codec readers, instead we introduced a SequentialStoredFieldsLeafReader that exposes the merge instance lightly (see SourceLookup).
That avoids rewriting all filter leaf readers as codec readers. And so in this case the LeafReaderWithLiveDocs should extend SequentialStoredFieldsLeafReader to expose this functionality.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks Jim!

@romseygeek
Copy link
Copy Markdown
Contributor Author

@elasticmachine update branch

@romseygeek
Copy link
Copy Markdown
Contributor Author

@elasticmachine run elasticsearch-ci/1

Copy link
Copy Markdown
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An IllegalStateException seems more appropriate ?

@romseygeek romseygeek force-pushed the engine/checkpoint-fieldsvisitor branch from 41d4ec8 to a87e67b Compare January 5, 2021 14:20
@romseygeek
Copy link
Copy Markdown
Contributor Author

@elasticmachine update branch

@romseygeek romseygeek merged commit c5afa9a into elastic:master Jan 6, 2021
@romseygeek romseygeek deleted the engine/checkpoint-fieldsvisitor branch January 6, 2021 10:20
romseygeek added a commit that referenced this pull request Jan 6, 2021
…66944)

When reading sequential data from compressed stored fields it is
more efficient to use a CodecReader's merge instance, as this holds
decompressed blocks in memory and avoids re-doing the
decompression for every document. When restoring the version map
during a recovery, we read id fields from all documents beyond the global
checkpoint, and this is precisely the situation where this performance
enhancement should help.

This commit updates the IdOnlyFieldsVisitor to use a merge instance,
and in the process tidies up the API a bit; we now have a package-private
IdStoredFieldLoader that handles fetching merge instances and loading
ids for each document.
dnhatn added a commit that referenced this pull request Jan 12, 2021
Reading operations in Lucene changes is likely sequential and more 
efficient with SequentialStoredFieldsLeafReader.

Relates #66944
dnhatn added a commit that referenced this pull request Jan 12, 2021
Reading operations in Lucene changes is likely sequential and more
efficient with SequentialStoredFieldsLeafReader.

Relates #66944
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jan 12, 2021
…67190)

Reading operations in Lucene changes is likely sequential and more
efficient with SequentialStoredFieldsLeafReader.

Relates elastic#66944
dnhatn added a commit that referenced this pull request Jan 12, 2021
Reading operations in Lucene changes is likely sequential and more
efficient with SequentialStoredFieldsLeafReader.

Relates #66944
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement Team:Distributed Meta label for distributed team. v7.12.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants