[Remote Store] Update index settings once all shard copies of an index moves to remote enabled nodes#13253
Conversation
|
❌ Gradle check result for 498ec5c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| // Let allocation service mark the incoming `RELOCATING` shard copies as `STARTED` | ||
| maybeUpdatedState = allocationService.applyStartedShards(currentState, shardRoutingsToBeApplied); | ||
| // Run remote store migration based tasks | ||
| if (ongoingDocrepToRemoteMigration(currentState.metadata().settings())) { |
There was a problem hiding this comment.
can reuse RemoteStoreNodeService#isMigratingToRemoteStore
| && (indexRoutingTable.shardsMatchingPredicateCount(ShardRouting::started) == indexRoutingTable.shardsMatchingPredicateCount( | ||
| shardRouting -> discoveryNodes.get(shardRouting.currentNodeId()).isRemoteStoreNode() | ||
| )); | ||
| return allStartedShardsOnRemote && IndexMetadata.INDEX_REMOTE_STORE_ENABLED_SETTING.get(indexMetadata.getSettings()) == false; |
There was a problem hiding this comment.
we can short circuit on IndexMetadata.INDEX_REMOTE_STORE_ENABLED_SETTING.get(indexMetadata.getSettings()) == false at the start of this function .
| && (indexRoutingTable.shardsMatchingPredicateCount(ShardRouting::started) == indexRoutingTable.shardsMatchingPredicateCount( | ||
| shardRouting -> discoveryNodes.get(shardRouting.currentNodeId()).isRemoteStoreNode() | ||
| )); | ||
| return allStartedShardsOnRemote && IndexMetadata.INDEX_REMOTE_STORE_ENABLED_SETTING.get(indexMetadata.getSettings()) == false; |
There was a problem hiding this comment.
The check of (indexRoutingTable.shardsMatchingPredicateCount(ShardRouting::started) == indexRoutingTable.shardsMatchingPredicateCount( shardRouting -> discoveryNodes.get(shardRouting.currentNodeId()).isRemoteStoreNode() is not accurate , as it is not account for started shards for checking shards on remote store . We should check just for all started shards that they are on remote node.
There was a problem hiding this comment.
We also need to make sure that all the primary shards are assigned as well . There can be case when 1 shard with 0 replica is unassigned , but that copy is present on a docrep node, which is temporarily down.
| * @param discoveryNodes Current set of {@link DiscoveryNodes} in the cluster | ||
| * @return {@link Tuple} with segment repository name as first element and translog repository name as second element | ||
| */ | ||
| public static Tuple<String, String> getRemoteStoreRepositoryNames(DiscoveryNodes discoveryNodes) { |
There was a problem hiding this comment.
The function doesn't fit well . Tuple also doesn't look the right datastructure for this . Would prefer two functions for retrieving translog repo and segment repo.
|
❌ Gradle check result for da6b91a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for c11bf32: ABORTED Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 4e15cf9: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
4e15cf9 to
da6b91a
Compare
da6b91a to
8d9c389
Compare
|
❌ Gradle check result for da6b91a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 8d9c389: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
Added logic within the
ShardStartedClusterStateTaskExecutorto apply remote based index settings and add remote path based index metadata once all shard copies of an index moves over to remote store enabled nodes. Currently this logic would only execute when the cluster is inmixedmode and there are remote enabled nodes present in the cluster.The
executelogic of this cluster state executor fetches the index names whose shards has been marked asSTARTED, references the currentRoutingTablefrom the incomingClusterStateto figure out if all shards of the index are inSTARTEDstate and all those shard copies are in remote store enabled nodes. If so, it mutates the incoming metadata by adding the remote store based settings, which is then persisted and published to the data nodes from the cluster managerRelated Issues
Resolves: #13252
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.