[RCI] Keep index routing table for closed indices by tlrx · Pull Request #34108 · elastic/elasticsearch

tlrx · 2018-09-27T08:58:05Z

Note: this pull request is against the replicated-closed-indices branch

This pull request changes the MetaDataIndexStateService class so that it does not remove the routing table of closed indices anymore but instead reinitializes the shards routing with an INITIALIZING state (and an unassigned INDEX_CLOSED reason). This way the primary shard allocator will take care of reallocating the shards to the nodes that already hold valid copies of the unassigned primaries, forcing the recreation of the shards on data node. Thanks to #33903 the shards will be recreated using a NoOpEngine.

This pull request also adds a removeIndices() the IndicesClusterStateService that detects when the state of an index is changed (CLOSE <-> OPEN) and close the associated IndexService, forcing its recreation.

I created this pull request to have feedback on these two points. The PR also adds some necessary tests and also adapts an important test IndicesClusterStateServiceRandomUpdatesTests.

It also mutes a lot of tests (more than I expected...) that sometimes fails because:

the test expects that closing an index release all shard resources (like shard lock) and this is not true anymore
the test expects that closing an index removes the index routing table for the index and this is not true anymore
the test expects that an index can be closed at anytime even if opened index shards are still initializing
the test fails because the translog is not empty when the shard is recreated using a NoOpEngine

For the first two cases, the test need to be adapted and I'd like to do this on a per-test basis. I haven't done it in this PR to reduce the noise.

For the two last cases, the Close Index API needs to be reworked so that it only closes an index when shards are active and fully in sync. This will address in another PR.

Relates #33888

This commit adds a new NoOpEngine implementation based on the current ReadOnlyEngine. This new implementation uses an empty DirectoryReader with no segments readers and will always returns 0 docs. The NoOpEngine is the default Engine created for IndexShards of closed indices. It expects an empty translog when it is instantiated. Relates to elastic#33888

elasticmachine · 2018-09-27T08:58:07Z

Pinging @elastic/es-distributed

ywelsch

I've left a few questions and suggestions. Core logic looks very good already.

ywelsch · 2018-09-28T12:04:17Z

server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

+     * Removes the {@link IndexService} of indices whose state has changed.
+     * Closing the index services here will force them to be recreated later along with their shards.
+     */
+    private void removeIndices(final ClusterChangedEvent event) {


can we fold this into updateIndices? There we use the same iteration order and have the same lookup patterns, so I think we can save a lot on the boiler plate and can avoid iterating yet another time over all indices

ywelsch · 2018-09-28T12:10:48Z

server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

-                final AllocatedIndices.IndexRemovalReason reason =
-                    indexMetaData != null && indexMetaData.getState() == IndexMetaData.State.CLOSE ? CLOSED : NO_LONGER_ASSIGNED;
+                AllocatedIndices.IndexRemovalReason reason = NO_LONGER_ASSIGNED;
+                if (indexMetaData != null) {


I'm not sure if it's worth the complexity of this code here just to provide a better message as to why an index service got removed. If you think it's useful, maybe factor the logic of determining the AllocatedIndices.IndexRemovalReason based on currentState and newState into a helper method so it can be reused by removeIndices

ywelsch · 2018-09-28T12:12:36Z

server/src/test/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateServiceTests.java

+
+    @Before
+    public void setUpService() {
+        AllocationService allocationService = createAllocationService(Settings.EMPTY);


you can also use this method without needing to subclass ESAllocationTestCase.

ywelsch · 2018-09-28T12:23:48Z

...est/java/org/elasticsearch/indices/cluster/IndicesClusterStateServiceRandomUpdatesTests.java

-            CloseIndexRequest closeIndexRequest = new CloseIndexRequest(state.metaData().index(index).getIndex().getName());
-            state = cluster.closeIndices(state, closeIndexRequest);
+            IndexMetaData indexMetaData = state.metaData().index(index);
+            if (state.routingTable().allShards(index).stream().allMatch(ShardRouting::started)) {


why restrict the test like this?

ywelsch · 2018-09-28T12:24:14Z

...est/java/org/elasticsearch/indices/cluster/IndicesClusterStateServiceRandomUpdatesTests.java

-            OpenIndexRequest openIndexRequest = new OpenIndexRequest(state.metaData().index(index).getIndex().getName());
-            state = cluster.openIndices(state, openIndexRequest);
+            IndexMetaData indexMetaData = state.metaData().index(index);
+            // Do not reopen an index that was just closed


ywelsch · 2018-09-28T12:25:44Z

...est/java/org/elasticsearch/indices/cluster/IndicesClusterStateServiceRandomUpdatesTests.java

+                final Index index = indexService.index();
+                // do not start or fail shards of indices that were just closed or reopened, because
+                // they are still initializing and we must wait for the cluster state to be applied
+                // on node before starting or failing them


I'm confused. The cluster state was already applied on the node, no? I don't understand the extra restriction here.

ywelsch · 2018-09-28T12:26:33Z

server/src/test/java/org/elasticsearch/plugins/PluginsServiceTests.java

        assertThat(e, hasToString(containsString(expected)));
    }

+    @AwaitsFix(bugUrl = "")


what's wrong with this test?

tlrx · 2019-01-30T11:15:27Z

Closed in favor of #38024

tlrx added 2 commits September 26, 2018 14:14

Keep index routing table for closed indices

a066f7f

tlrx added >enhancement :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Sep 27, 2018

tlrx requested review from bleskes and ywelsch September 27, 2018 08:58

tlrx mentioned this pull request Sep 27, 2018

Replicate closed indices #33888

Closed

50 tasks

ywelsch suggested changes Sep 28, 2018

View reviewed changes

tlrx force-pushed the replicated-closed-indices branch from d2be022 to 8e93bc4 Compare October 26, 2018 08:00

tlrx force-pushed the replicated-closed-indices branch 2 times, most recently from 95cfc1e to c2a3ec2 Compare November 8, 2018 10:59

tlrx force-pushed the replicated-closed-indices branch from c2a3ec2 to 71fe1ac Compare December 4, 2018 14:21

tlrx force-pushed the replicated-closed-indices branch 5 times, most recently from 49ce196 to fbbeff4 Compare January 15, 2019 11:14

tlrx force-pushed the replicated-closed-indices branch from fbbeff4 to b00b323 Compare January 22, 2019 08:45

tlrx force-pushed the replicated-closed-indices branch 5 times, most recently from a0296fc to e53a9be Compare January 30, 2019 10:36

tlrx closed this Jan 30, 2019

tlrx deleted the keep-index-routing-table-for-closed-indices branch July 1, 2019 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RCI] Keep index routing table for closed indices#34108

[RCI] Keep index routing table for closed indices#34108
tlrx wants to merge 2 commits intoelastic:replicated-closed-indicesfrom
tlrx:keep-index-routing-table-for-closed-indices

tlrx commented Sep 27, 2018 •

edited

Loading

Uh oh!

elasticmachine commented Sep 27, 2018

Uh oh!

ywelsch left a comment

Uh oh!

ywelsch Sep 28, 2018

Uh oh!

ywelsch Sep 28, 2018

Uh oh!

ywelsch Sep 28, 2018

Uh oh!

ywelsch Sep 28, 2018

Uh oh!

ywelsch Sep 28, 2018

Uh oh!

ywelsch Sep 28, 2018

Uh oh!

ywelsch Sep 28, 2018

Uh oh!

tlrx commented Jan 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tlrx commented Sep 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Sep 27, 2018

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

ywelsch Sep 28, 2018

Choose a reason for hiding this comment

Uh oh!

ywelsch Sep 28, 2018

Choose a reason for hiding this comment

Uh oh!

ywelsch Sep 28, 2018

Choose a reason for hiding this comment

Uh oh!

ywelsch Sep 28, 2018

Choose a reason for hiding this comment

Uh oh!

ywelsch Sep 28, 2018

Choose a reason for hiding this comment

Uh oh!

ywelsch Sep 28, 2018

Choose a reason for hiding this comment

Uh oh!

ywelsch Sep 28, 2018

Choose a reason for hiding this comment

Uh oh!

tlrx commented Jan 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tlrx commented Sep 27, 2018 •

edited

Loading