Recover closed indices after a full cluster restart by tlrx · Pull Request #39249 · elastic/elasticsearch

tlrx · 2019-02-21T15:07:25Z

Note: this pull request is aimed to be merged in the replicated-closed-indices feature branch

Closing an index is a process that can be broken down into several steps:

first, the state of the cluster is updated to add a write block on the index to be closed
then, a transport replication action is executed on all shards of the index. This action checks that the maximum sequence number and the global checkpoint have identical values, indicating that all in flight writing operations have been completed on the shard.
finally, and if the previous steps were successful, the cluster state is updated again to change the state of the index from OPENto CLOSE.

During the last step, the master node retrieves the minimum node version among all the nodes that compose the cluster:

If a node is in pre 8.0 version, the index is closed and the index routing table is removed from the cluster state. This is the "old" way of closing indices and closed indices with no routing table are not replicated.
If all nodes are in version 8.0 or higher, the index is closed and its routing table is reinitialized in cluster state. This is the new way of closing indices and such closed indices will be replicated in the cluster.

But routing tables are not persisted in the cluster state, so after a full cluster restart there is no way to make the distinction between an index closed in 7.x and an index closed and replicated on 8.0.

This pull request introduces a new private index settings named index.verified_before_close that is added to closed indices that are replicated at closing time. This setting serves as a marker to indicate that the index has been closed using the new Close Index API on a cluster that supports replication of closed indices.

This way, after a full cluster restart, the Gateway service can automatically recovers those closed indices as if they were opened indices. Closed indices that don't have this setting (because they were closed on a pre-8.0 cluster, or a cluster in mixed version) won't be recovered and will need to be reopened and closed again on a 8.0 cluster.

Note that reopening the index removes the private setting. This pull request also adds a full cluster restart test and mixed cluster test.

Relates to #33888

elasticmachine · 2019-02-21T15:07:27Z

Pinging @elastic/es-distributed

tlrx · 2019-02-21T15:09:44Z

server/src/main/java/org/elasticsearch/cluster/routing/RoutingTable.java


        public Builder addAsRecovery(IndexMetaData indexMetaData) {
-            if (indexMetaData.getState() == IndexMetaData.State.OPEN) {
+            if (indexMetaData.getState() == IndexMetaData.State.OPEN || MetaDataIndexStateService.isIndexMetaDataClosed(indexMetaData)) {


This is where the distinction is done between a closed index and a closed index on a 8.0 version (with the new index.closed setting)

tlrx · 2019-02-21T15:10:56Z

server/src/test/java/org/elasticsearch/cluster/allocation/ClusterRerouteIT.java

                .execute().actionGet();

+        final boolean closed = randomBoolean();
+        if (closed) {


This test executes a full cluster restart, so I adapted it to randomly run on closed indices in the same pull request.

tlrx · 2019-02-21T15:11:48Z

x-pack/plugin/ccr/src/test/java/org/elasticsearch/xpack/ccr/IndexFollowingIT.java


        assertBusy(() -> {
-            assertThat(getFollowTaskSettingsVersion("follower"), equalTo(2L));
+            assertThat(getFollowTaskSettingsVersion("follower"), equalTo(4L));


Open/Close API now always increase the index settings version

…ect-routing-table-for-closed-shards

node in lower version than primary

tlrx · 2019-02-25T15:58:34Z

@ywelsch as we talked via another channel, the index.closed setting has been renamed to index.verified_before_close.

ywelsch

LGTM

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

qa/full-cluster-restart/src/test/java/org/elasticsearch/upgrades/FullClusterRestartIT.java

tlrx · 2019-02-26T08:30:35Z

Thanks @ywelsch

Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again. This commit changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed. This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health. Some APIs have been adapted to make them work with closed indices: - Cluster Health API - Cluster Reroute API - Cluster Allocation Explain API - Recovery API - Cat Indices - Cat Shards - Cat Health - Cat Recovery This commit contains all the following changes (most recent first): * c6c42a1 Adapt NoOpEngineTests after #39006 * 3f9993d Wait for shards to be active after closing indices (#38854) * 5e7a428 Adapt the Cluster Health API to closed indices (#39364) * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767) * 71f5c34 Recover closed indices after a full cluster restart (#39249) * 4db7fd9 Adapt the Recovery API for closed indices (#38421) * 4fd1bb2 Adapt more tests suites to closed indices (#39186) * 0519016 Add replica to primary promotion test for closed indices (#39110) * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631) * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955) * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex() * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329) * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327) * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326) * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024) * e53a9be Fix compilation error in IndexShardIT after merge with master * cae4155 Relax NoOpEngine constraints (#37413) * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903) Relates to #33888

Closing an index is a process that can be broken down into several steps: 1. first, the state of the cluster is updated to add a write block on the index to be closed 2. then, a transport replication action is executed on all shards of the index. This action checks that the maximum sequence number and the global checkpoint have identical values, indicating that all in flight writing operations have been completed on the shard. 3. finally, and if the previous steps were successful, the cluster state is updated again to change the state of the index from `OPEN`to `CLOSE`. During the last step, the master node retrieves the minimum node version among all the nodes that compose the cluster: * If a node is in pre 8.0 version, the index is closed and the index routing table is removed from the cluster state. This is the "old" way of closing indices and closed indices with no routing table are not replicated. * If all nodes are in version 8.0 or higher, the index is closed and its routing table is reinitialized in cluster state. This is the new way of closing indices and such closed indices will be replicated in the cluster. But routing tables are not persisted in the cluster state, so after a full cluster restart there is no way to make the distinction between an index closed in 7.x and an index closed and replicated on 8.0. This commit introduces a new private index settings named `index.verified_before_close` that is added to closed indices that are replicated at closing time. This setting serves as a marker to indicate that the index has been closed using the new Close Index API on a cluster that supports replication of closed indices. This way, after a full cluster restart, the Gateway service can automatically recovers those closed indices as if they were opened indices. Closed indices that don't have this setting (because they were closed on a pre-8.0 cluster, or a cluster in mixed version) won't be recovered and will need to be reopened and closed again on a 8.0 cluster. Note that reopening the index removes the private setting. Relates to elastic#33888

Backport support for replicating closed indices (#39499) Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again. This commit changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed. This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health. Some APIs have been adapted to make them work with closed indices: - Cluster Health API - Cluster Reroute API - Cluster Allocation Explain API - Recovery API - Cat Indices - Cat Shards - Cat Health - Cat Recovery This commit contains all the following changes (most recent first): * c6c42a1 Adapt NoOpEngineTests after #39006 * 3f9993d Wait for shards to be active after closing indices (#38854) * 5e7a428 Adapt the Cluster Health API to closed indices (#39364) * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767) * 71f5c34 Recover closed indices after a full cluster restart (#39249) * 4db7fd9 Adapt the Recovery API for closed indices (#38421) * 4fd1bb2 Adapt more tests suites to closed indices (#39186) * 0519016 Add replica to primary promotion test for closed indices (#39110) * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631) * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955) * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex() * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329) * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327) * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326) * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024) * e53a9be Fix compilation error in IndexShardIT after merge with master * cae4155 Relax NoOpEngine constraints (#37413) * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903) Relates to #33888

Adapt Gateway service to initialize routing table for closed indices

6743573

tlrx added >enhancement :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Feb 21, 2019

tlrx requested a review from ywelsch February 21, 2019 15:07

tlrx commented Feb 21, 2019

View reviewed changes

tlrx mentioned this pull request Feb 21, 2019

Replicate closed indices #33888

Closed

50 tasks

tlrx added 3 commits February 22, 2019 09:36

Merge branch 'replicated-closed-indices' into rci-gateway-service-inj…

32a2afb

…ect-routing-table-for-closed-shards

Fix testCloseIndexDuringRollingUpgrade, replica cannot be allocated to

555a529

node in lower version than primary

Rename setting

4380fe9

Fix RecoveryIT

1501bc1

ywelsch approved these changes Feb 25, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java Outdated Show resolved Hide resolved

ywelsch reviewed Feb 25, 2019

View reviewed changes

qa/full-cluster-restart/src/test/java/org/elasticsearch/upgrades/FullClusterRestartIT.java Outdated Show resolved Hide resolved

Apply feedback

158939a

tlrx merged commit 71f5c34 into elastic:replicated-closed-indices Feb 26, 2019

tlrx deleted the rci-gateway-service-inject-routing-table-for-closed-shards branch February 26, 2019 08:30

tlrx mentioned this pull request Feb 28, 2019

Add support for replicating closed indices #39499

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recover closed indices after a full cluster restart#39249

Recover closed indices after a full cluster restart#39249
tlrx merged 6 commits intoelastic:replicated-closed-indicesfrom
tlrx:rci-gateway-service-inject-routing-table-for-closed-shards

tlrx commented Feb 21, 2019 •

edited

Loading

Uh oh!

elasticmachine commented Feb 21, 2019

Uh oh!

tlrx Feb 21, 2019

Uh oh!

tlrx Feb 21, 2019

Uh oh!

tlrx Feb 21, 2019

Uh oh!

tlrx commented Feb 25, 2019

Uh oh!

ywelsch left a comment

Uh oh!

Uh oh!

Uh oh!

tlrx commented Feb 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tlrx commented Feb 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Feb 21, 2019

Uh oh!

tlrx Feb 21, 2019

Choose a reason for hiding this comment

Uh oh!

tlrx Feb 21, 2019

Choose a reason for hiding this comment

Uh oh!

tlrx Feb 21, 2019

Choose a reason for hiding this comment

Uh oh!

tlrx commented Feb 25, 2019

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tlrx commented Feb 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tlrx commented Feb 21, 2019 •

edited

Loading