Propagate last node to reinitialized routing tables#91549
Conversation
When closing or opening an index, or restoring a snapshot over a closed index, we reinitialize its routing table from scratch and expect the gateway allocators to select the appropriate node for each shard copy. With this commit we also keep track of the last-allocated node ID for each copy which makes it more likely that the desired balance of these shards remains unchanged too. Closes elastic#91472
|
Pinging @elastic/es-distributed (Team:Distributed) |
…ount, expect all the node IDs to be filled in
henningandersen
left a comment
There was a problem hiding this comment.
One smaller concern, otherwise this looks good.
| } | ||
| final var previousNodes = new ArrayList<String>(previousShardRoutingTable.size()); | ||
| previousNodes.add(primaryNode); | ||
| for (final var assignedShard : previousShardRoutingTable.assignedShards()) { |
There was a problem hiding this comment.
This also includes the target of relocations. I wonder if we should only look at active shards, since anything less will anyway not be considered good enough by the gateway allocator?
The problem I see with this is that if a relocation is ongoing, we risk a copy having a last allocated node id that is much worse than it could be (i.e., a node that only has just started the recovery)?
| assertThat(shard.unassignedInfo().getReason(), equalTo(expectedUnassignedReason)); | ||
| final var lastAllocatedNodeId = shard.unassignedInfo().getLastAllocatedNodeId(); | ||
| if (lastAllocatedNodeId == null) { | ||
| // restoring an index may change the number of shards/replicas so no guarantee that lastAllocatedNodeId is populated |
There was a problem hiding this comment.
I think only the number of replicas, not the number of shards can be changed? Probably what you meant with shards/replicas, but removing "shards/" would be better I think.
| // restoring an index may change the number of shards/replicas so no guarantee that lastAllocatedNodeId is populated | |
| // restoring an index may change the number of replicas so no guarantee that lastAllocatedNodeId is populated |
There was a problem hiding this comment.
On the contrary, I didn't think there's anything to require that the snapshot has the same number of shards as index on top of which it's being restored.
There was a problem hiding this comment.
Ahh, right, thanks.
| // both original and restored index must have at least one shard tho | ||
| assertTrue(foundAnyNodeIds); |
There was a problem hiding this comment.
Can this not go one line up, i.e., we can check this for every shard id?
There was a problem hiding this comment.
Not if the shard count can change in a restore (which AFAIK it can)
* main: (163 commits) [DOCS] Edits frequent items aggregation (elastic#91564) Handle providers of optional services in ubermodule classloader (elastic#91217) Add `exportDockerImages` lifecycle task for exporting docker tarballs (elastic#91571) Fix CSV dependency report output file location in DRA CI job Fix variable placeholder for Strings.format calls (elastic#91531) Fix output dir creation in ConcatFileTask (elastic#91568) Fix declaration of dependencies in DRA snapshots CI job (elastic#91569) Upgrade Gradle Enterprise plugin to 3.11.4 (elastic#91435) Ingest DateProcessor (small) speedup, optimize collections code in DateFormatter.forPattern (elastic#91521) Fix inter project handling of generateDependenciesReport (elastic#91555) [Synthetics] Add synthetics-* read to fleet-server (elastic#91391) [ML] Copy more settings when creating DF analytics destination index (elastic#91546) Reduce CartesianCentroidIT flakiness (elastic#91553) Propagate last node to reinitialized routing tables (elastic#91549) Forecast write load during rollovers (elastic#91425) [DOCS] Warn about potential overhead of named queries (elastic#91512) Datastream unavailable exception metadata (elastic#91461) Generate docker images and dependency report in DRA ci job (elastic#91545) Support cartesian_bounds aggregation on point and shape (elastic#91298) Add support for EQL samples queries (elastic#91312) ... # Conflicts: # x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardIndexer.java
When closing or opening an index, or restoring a snapshot over a closed index, we reinitialize its routing table from scratch and expect the gateway allocators to select the appropriate node for each shard copy. With this commit we also keep track of the last-allocated node ID for each copy which makes it more likely that the desired balance of these shards remains unchanged too.
Closes #91472