Keep snapshot restore state and routing table in sync#20836
Keep snapshot restore state and routing table in sync#20836ywelsch merged 3 commits intoelastic:masterfrom
Conversation
The snapshot restore state tracks information about shards being restored from a snapshot in the cluster state. For example it records if a shard has been successfully restored or if restoring it was not possible due to a corruption of the snapshot. Recording these events is usually based on changes to the shard routing table, i.e., when a shard is started after a successful restore or failed after an unsuccessful one. As of now, there were two communication channels to transmit recovery failure / success to update the routing table and the restore state. This lead to issues where a shard was failed but the restore state was not updated due to connection issues between data and master node. In some rare situations, this lead to an issue where the restore state could not be properly cleaned up anymore by the master, making it impossible to start new restore operations. The following change updates routing table and restore state in the same cluster state update so that both always stay in sync. It also eliminates the extra communication channel for restore operations and uses standard cluster state listener mechanism to update restore listener upon successful completion of a snapshot.
imotov
left a comment
There was a problem hiding this comment.
Left a couple of minor comments. Otherwise, LGTM.
| assert newRoutingTable.validate(newMetaData); // validates the routing table is coherent with the cluster state metadata | ||
| final ClusterState newState = ClusterState.builder(oldState).routingTable(newRoutingTable).metaData(newMetaData).build(); | ||
| final RestoreInProgress restoreInProgress = allocation.custom(RestoreInProgress.TYPE); | ||
| RestoreInProgress updatedRestoreInProgress = allocation.updateRestoreInfoWithRoutingChanges(restoreInProgress); |
There was a problem hiding this comment.
That doesn't seem to cause any issues, but I think moving this into the if statement bellow might help to clarify the logic. This method can be called when no restore takes place and we kind of plunge head on into updating restore info without even checking if restore actually takes place. We check it inside updateRestoreInfoWithRoutingChanges, but I think it might make the logic clearer if we checked it here.
| RecoverySource recoverySource = failedShard.recoverySource(); | ||
| if (recoverySource.getType() == RecoverySource.Type.SNAPSHOT) { | ||
| Snapshot snapshot = ((SnapshotRecoverySource) recoverySource).snapshot(); | ||
| if (Lucene.isCorruptionException(unassignedInfo.getFailure().getCause())) { |
There was a problem hiding this comment.
Could you add a comment explaining why we only fail in case of lucene corruption?
|
thanks for the review @imotov |
The snapshot restore state tracks information about shards being restored from a snapshot in the cluster state. For example it records if a shard has been successfully restored or if restoring it was not possible due to a corruption of the snapshot. Recording these events is usually based on changes to the shard routing table, i.e., when a shard is started after a successful restore or failed after an unsuccessful one. As of now, there were two communication channels to transmit recovery failure / success to update the routing table and the restore state. This lead to issues where a shard was failed but the restore state was not updated due to connection issues between data and master node. In some rare situations, this lead to an issue where the restore state could not be properly cleaned up anymore by the master, making it impossible to start new restore operations. The following change updates routing table and restore state in the same cluster state update so that both always stay in sync. It also eliminates the extra communication channel for restore operations and uses standard cluster state listener mechanism to update restore listener upon successful completion of a snapshot.
|
Is there any hope of this being backported to 1.7 version? |
|
No, sorry, 1.7 is end-of-life. |
The snapshot restore state tracks information about shards being restored from a snapshot in the cluster state. For example it records if a shard has been successfully restored or if restoring it was not possible due to a corruption of the snapshot. Recording these events is usually based on changes to the shard routing table, i.e., when a shard is started after a successful restore or failed after an unsuccessful one. As of now, there were two communication channels to transmit recovery failure / success to update the routing table and the restore state. This lead to issues where a shard was failed but the restore state was not updated due to connection issues between data and master node. In some rare situations, this lead to an issue where the restore state could not be properly cleaned up anymore by the master, making it impossible to start new restore operations. The following change updates routing table and restore state in the same cluster state update so that both always stay in sync. It also eliminates the extra communication channel for restore operations and uses the standard cluster state listener mechanism to update restore listener upon successful completion of a snapshot restore.
Closes #19774