Ignore shard started requests when primary term does not match#37899
Merged
tlrx merged 5 commits intoelastic:masterfrom Jan 29, 2019
Merged
Ignore shard started requests when primary term does not match#37899tlrx merged 5 commits intoelastic:masterfrom
tlrx merged 5 commits intoelastic:masterfrom
Conversation
Collaborator
|
Pinging @elastic/es-distributed |
Member
Author
|
All tests green (except oss-distro-docs, but it fails because of another change) but I'd like more CI runs so: @elasticmachine test this please |
Member
Author
|
Green again except oss-distro-docs, let's run it one more time: @elasticmachine test this please |
ywelsch
suggested changes
Jan 28, 2019
server/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/indices/cluster/ClusterStateChanges.java
Outdated
Show resolved
Hide resolved
Member
Author
|
Thanks @ywelsch, I updated the PR. Let me know what you think |
ywelsch
approved these changes
Jan 28, 2019
server/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java
Outdated
Show resolved
Hide resolved
Contributor
There was a problem hiding this comment.
should we have the same assertion directly in IndexShard?
Member
Author
There was a problem hiding this comment.
It's already present under this form so I think we're good.
4f779c0 to
2d39807
Compare
Member
Author
|
@elasticmachine run elasticsearch-ci/2 |
1 similar comment
Member
Author
|
@elasticmachine run elasticsearch-ci/2 |
Member
Author
|
@elasticmachine run elasticsearch-ci/default-distro |
Member
Author
|
Thanks @ywelsch |
tlrx
added a commit
that referenced
this pull request
Jan 31, 2019
This pull request disables BWC tests while backporting #37899 to 6.x.
tlrx
added a commit
that referenced
this pull request
Jan 31, 2019
This commit changes the StartedShardEntry so that it also contains the primary term of the shard to start. This way the master node can also checks that the primary term from the start request is equal to the current shard's primary term in the cluster state, and it can ignore any shard started request that would concerns a previous instance of the shard that would have been allocated to the same node. Such situation are likely to happen with frozen (or restored) indices and the replication of closed indices, because with replicated closed indices the shards will be initialized again after the index is closed and can potentially be re initialized again if the index is reopened as a frozen index. In such cases the lifecycle of the shards would be something like: * shard is STARTED * index is closed * shards is INITIALIZING (index state is CLOSED, primary term is X) * index is reopened * shards are INITIALIZING again (index state is OPENED, potentially frozen, primary term is X+1) Adding the primary term to the shard started request will allow to discard potential StartedShardEntry requests received by the master node if the request concerns the shard with primary term X because it has been moved/reinitialized in the meanwhile under the primary term X+1. Relates to #33888
tlrx
added a commit
to tlrx/elasticsearch
that referenced
this pull request
Jan 31, 2019
This commit adapts the version used in StartedShardEntry serialization after the backport of elastic#37899 and reenables bwc tests. Related to elastic#37899 Related to elastic#38074
Member
Author
|
Backported to 6.x in 255015d |
jasontedor
added a commit
to jasontedor/elasticsearch
that referenced
this pull request
Jan 31, 2019
…ersion * elastic/master: Do not set up NodeAndClusterIdStateListener in test (elastic#38110) ML: better handle task state race condition (elastic#38040) Soft-deletes policy should always fetch latest leases (elastic#37940) Handle scheduler exceptions (elastic#38014) Minor logging improvements (elastic#38084) Fix Painless void return bug (elastic#38046) Update PutFollowAction serialization post-backport (elastic#37989) fix a few versionAdded values in ElasticsearchExceptions (elastic#37877) Reenable BWC tests after backport of elastic#37899 (elastic#38093) Mute failing test Mute failing test Fail start on obsolete indices documentation (elastic#37786) SQL: Implement FIRST/LAST aggregate functions (elastic#37936) Un-mute NoMasterNodeIT.testNoMasterActionsWriteMasterBlock remove unused parser fields in RemoteResponseParsers
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a shard on a data node finished to recover, the data node sends a
StartedShardEntryrequest to the master node. This request contains the shard id, an allocation id and the id of the node that holds the shard to start. This request is then processed by the master node which checks that the shard exists in the cluster state with the same allocation id as in the request.This pull request changes the
StartedShardEntryso that it also contains the primary term of the shard to start. This way the master node can also checks that the primary term from the start request is equal to the current shard's primary term in the cluster state, and it can ignore any shard started request that would concerns a previous instance of the shard that would have been allocated to the same node.Such situation are likely to happen with frozen (or restored) indices and the replication of closed indices, because with replicated closed indices the shards will be initialized again after the index is closed and can potentially be re initialized again if the index is reopened as a frozen index. In such cases the lifecycle of the shards would be something like:
Adding the primary term to the shard started request will allow to discard potential
StartedShardEntryrequests received by the master node if the request concerns the shard with primary term X because it has been moved/reinitialized in the meanwhile under the primary term X+1.Relates to #33888