[RCI] Keep index routing table for closed indices#34108
[RCI] Keep index routing table for closed indices#34108tlrx wants to merge 2 commits intoelastic:replicated-closed-indicesfrom
Conversation
This commit adds a new NoOpEngine implementation based on the current ReadOnlyEngine. This new implementation uses an empty DirectoryReader with no segments readers and will always returns 0 docs. The NoOpEngine is the default Engine created for IndexShards of closed indices. It expects an empty translog when it is instantiated. Relates to elastic#33888
|
Pinging @elastic/es-distributed |
ywelsch
left a comment
There was a problem hiding this comment.
I've left a few questions and suggestions. Core logic looks very good already.
| * Removes the {@link IndexService} of indices whose state has changed. | ||
| * Closing the index services here will force them to be recreated later along with their shards. | ||
| */ | ||
| private void removeIndices(final ClusterChangedEvent event) { |
There was a problem hiding this comment.
can we fold this into updateIndices? There we use the same iteration order and have the same lookup patterns, so I think we can save a lot on the boiler plate and can avoid iterating yet another time over all indices
| final AllocatedIndices.IndexRemovalReason reason = | ||
| indexMetaData != null && indexMetaData.getState() == IndexMetaData.State.CLOSE ? CLOSED : NO_LONGER_ASSIGNED; | ||
| AllocatedIndices.IndexRemovalReason reason = NO_LONGER_ASSIGNED; | ||
| if (indexMetaData != null) { |
There was a problem hiding this comment.
I'm not sure if it's worth the complexity of this code here just to provide a better message as to why an index service got removed. If you think it's useful, maybe factor the logic of determining the AllocatedIndices.IndexRemovalReason based on currentState and newState into a helper method so it can be reused by removeIndices
|
|
||
| @Before | ||
| public void setUpService() { | ||
| AllocationService allocationService = createAllocationService(Settings.EMPTY); |
There was a problem hiding this comment.
you can also use this method without needing to subclass ESAllocationTestCase.
| CloseIndexRequest closeIndexRequest = new CloseIndexRequest(state.metaData().index(index).getIndex().getName()); | ||
| state = cluster.closeIndices(state, closeIndexRequest); | ||
| IndexMetaData indexMetaData = state.metaData().index(index); | ||
| if (state.routingTable().allShards(index).stream().allMatch(ShardRouting::started)) { |
There was a problem hiding this comment.
why restrict the test like this?
| OpenIndexRequest openIndexRequest = new OpenIndexRequest(state.metaData().index(index).getIndex().getName()); | ||
| state = cluster.openIndices(state, openIndexRequest); | ||
| IndexMetaData indexMetaData = state.metaData().index(index); | ||
| // Do not reopen an index that was just closed |
| final Index index = indexService.index(); | ||
| // do not start or fail shards of indices that were just closed or reopened, because | ||
| // they are still initializing and we must wait for the cluster state to be applied | ||
| // on node before starting or failing them |
There was a problem hiding this comment.
I'm confused. The cluster state was already applied on the node, no? I don't understand the extra restriction here.
| assertThat(e, hasToString(containsString(expected))); | ||
| } | ||
|
|
||
| @AwaitsFix(bugUrl = "") |
There was a problem hiding this comment.
what's wrong with this test?
d2be022 to
8e93bc4
Compare
95cfc1e to
c2a3ec2
Compare
c2a3ec2 to
71fe1ac
Compare
49ce196 to
fbbeff4
Compare
fbbeff4 to
b00b323
Compare
a0296fc to
e53a9be
Compare
|
Closed in favor of #38024 |
Note: this pull request is against the replicated-closed-indices branch
This pull request changes the
MetaDataIndexStateServiceclass so that it does not remove the routing table of closed indices anymore but instead reinitializes the shards routing with anINITIALIZINGstate (and an unassignedINDEX_CLOSEDreason). This way the primary shard allocator will take care of reallocating the shards to the nodes that already hold valid copies of the unassigned primaries, forcing the recreation of the shards on data node. Thanks to #33903 the shards will be recreated using a NoOpEngine.This pull request also adds a
removeIndices()theIndicesClusterStateServicethat detects when the state of an index is changed (CLOSE <-> OPEN) and close the associatedIndexService, forcing its recreation.I created this pull request to have feedback on these two points. The PR also adds some necessary tests and also adapts an important test
IndicesClusterStateServiceRandomUpdatesTests.It also mutes a lot of tests (more than I expected...) that sometimes fails because:
For the first two cases, the test need to be adapted and I'd like to do this on a per-test basis. I haven't done it in this PR to reduce the noise.
For the two last cases, the Close Index API needs to be reworked so that it only closes an index when shards are active and fully in sync. This will address in another PR.
Relates #33888