Skip to content

Add the ability to fetch the latest successful shard snapshot#75080

Merged
fcofdez merged 18 commits intoelastic:masterfrom
fcofdez:fetch-info-recovery
Jul 22, 2021
Merged

Add the ability to fetch the latest successful shard snapshot#75080
fcofdez merged 18 commits intoelastic:masterfrom
fcofdez:fetch-info-recovery

Conversation

@fcofdez
Copy link
Copy Markdown
Contributor

@fcofdez fcofdez commented Jul 7, 2021

This commit adds a new master transport action TransportGetShardSnapshotAction
that allows getting the last successful snapshot for a particular
shard in a set of repositories. It deals with the different
implementation details around BwC for repositories.

Relates #73496

@fcofdez fcofdez force-pushed the fetch-info-recovery branch 2 times, most recently from 33fb9db to e39267f Compare July 8, 2021 08:47
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could take into account PARTIAL snapshots too, the implementation would be a little less straightforward but we could consider it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to defer this to a follow-up after we have done other tasks, this refinement can be done while benchmarking.

This commit adds a new master transport action TransportGetShardSnapshotAction
that allows getting the last successful snapshot for a particular
shard in a set of repositories. It deals with the different
implementation details around BwC for repositories.
@fcofdez fcofdez force-pushed the fetch-info-recovery branch from e39267f to c244436 Compare July 8, 2021 09:33
@fcofdez fcofdez added v7.15.0 :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team. labels Jul 8, 2021
@fcofdez fcofdez marked this pull request as ready for review July 8, 2021 10:42
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick look only, added a couple of comments.

I would like to see added wire serialization tests too for the new serializable classes. See: AbstractWireSerializingTestCase

);

for (String repository : repositories) {
indexSnapshotsService.getLatestSuccessfulSnapshotForShard(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could run in parallel on all the repos. I am guessing we will only need one repo in reality so it might not matter. But I would prefer to go serially through the repos and early terminate when a good snapshot is found. Or to just remove the multi-repo support, we can always add that if another use comes along.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of the multi-repo support is to avoid needing to implement a notion of a "default repo", we can simply look at all available repos. Moreover I don't think we can early-terminate since we're searching for the latest snapshot.

However I do agree that we shouldn't fan out to all the repos at once like this, working one-at-a-time would be better.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to defer this to a follow-up after we have done other tasks, this refinement can be done while benchmarking.

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left mostly small comments.

* Information about snapshotted file
*/
public static class FileInfo {
public static class FileInfo implements Writeable {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we really going to need the details of the files in the response? I think it would make more sense for the master to find a good snapshot and pass its ID onto the data node is, but then have the data node read these details itself.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in d0e1ce0. I was wondering if we should parse the latest commit in the snapshot to get the max seqNo and local checkpoint and include that in the response.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but I think we can leave that to the data nodes.

);

for (String repository : repositories) {
indexSnapshotsService.getLatestSuccessfulSnapshotForShard(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of the multi-repo support is to avoid needing to implement a notion of a "default repo", we can simply look at all available repos. Moreover I don't think we can early-terminate since we're searching for the latest snapshot.

However I do agree that we shouldn't fan out to all the repos at once like this, working one-at-a-time would be better.

@fcofdez fcofdez requested a review from DaveCTurner July 12, 2021 17:51
@fcofdez
Copy link
Copy Markdown
Contributor Author

fcofdez commented Jul 12, 2021

Thanks for the review @DaveCTurner ! I've tackled your comments. Let me know if you think we should change something else 👍

Copy link
Copy Markdown
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few random points, mainly around testing. Looking just fine otherwise :)


this.indexId = indexId;
this.shardId = shardId;
this.snapshotInfo = snapshotInfo;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT this is the complete SnapshotInfo, including all indices and shard failures (including stack traces) and so on. That might be pretty sizeable, and mostly not relevant to the shard in question. Can we just pick out the bits we need?

* Information about snapshotted file
*/
public static class FileInfo {
public static class FileInfo implements Writeable {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but I think we can leave that to the data nodes.

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left three pretty small comments; apart from that this all looks good.

Copy link
Copy Markdown
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to David's open points, LGTM otherwise no need for another round from me I think :)

@fcofdez
Copy link
Copy Markdown
Contributor Author

fcofdez commented Jul 21, 2021

@elasticmachine run elasticsearch-ci/part-1
(Unrelated failure)

@fcofdez
Copy link
Copy Markdown
Contributor Author

fcofdez commented Jul 21, 2021

@elasticmachine run elasticsearch-ci/part-1
(Unrelated failure)

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (one more tiny suggestion). Great work @fcofdez 👍

GetShardSnapshotRequest(List<String> repositories, ShardId shardId) {
assert repositories.isEmpty() == false;
assert repositories.stream().noneMatch(Objects::isNull);
assert repositories.size() == 1 && repositories.get(0).equals(ALL_REPOSITORIES)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
assert repositories.size() == 1 && repositories.get(0).equals(ALL_REPOSITORIES)
assert repositories.size() == 1

(if not, I'd prefer having clarifying parentheses, I find it trappy to rely on && binding more tightly than ||)

@fcofdez fcofdez merged commit 90148fe into elastic:master Jul 22, 2021
@fcofdez
Copy link
Copy Markdown
Contributor Author

fcofdez commented Jul 22, 2021

Thanks all for the review!

fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Jul 22, 2021
This commit adds a new master transport action TransportGetShardSnapshotAction
that allows getting the last successful snapshot for a particular
shard in a set of repositories. It deals with the different
implementation details around BwC for repositories.

Relates elastic#73496
Backport of elastic#75080
fcofdez added a commit that referenced this pull request Jul 22, 2021
This commit adds a new master transport action TransportGetShardSnapshotAction
that allows getting the last successful snapshot for a particular
shard in a set of repositories. It deals with the different
implementation details around BwC for repositories.

Relates #73496
Backport of #75080
ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request Jul 30, 2021
…c#75080)

This commit adds a new master transport action TransportGetShardSnapshotAction
that allows getting the last successful snapshot for a particular
shard in a set of repositories. It deals with the different
implementation details around BwC for repositories.

Relates elastic#73496
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Meta label for distributed team. v7.15.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants