Skip to content

[BUG] Race in node-left and node-join can prevent node from joining the cluster indefinitely #4874

@Bukhtawar

Description

@Bukhtawar

Describe the bug
A node in an OpenSearch cluster can fail due to many reasons(health-check failure/lagging etc)which triggers a node-left cluster state update on the leader which is responsible for removing the connections as a part of applying the new(node-left) cluster state. However it is possible that a data node triggers a node-join quickly(before the leader has removed the connection as a part of cluster state apply) that reuses the connection and attempts to connect to the leader. At this point the leader starts to process a node-join request, updates its followers and schedules a follower checker on the newly joined node. But before the follower checker can get to this connection the in-flight node-left cluster state updates cleans up the connection only for the follower checker to realize the node is not connected thereby failing the follower checks and triggering another node-left. The leader doesn't even send the node-join cluster state to the data node since it thinks it isn't connected as a result the peer finder keeps on sending join request to the leader and the loop goes on.....

Node join

[2022-10-18T17:00:04,691][INFO ][o.e.c.s.MasterService    ] [684c88cd4e749b366ab49c1f195504da] node-join[{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir} join existing leader

Follower checker scheduled post node-join which fails to find a connection

[2022-10-18T17:00:04,788][DEBUG][o.e.c.c.FollowersChecker ] [684c88cd4e749b366ab49c1f195504da] FollowerChecker{discoveryNode={5b7033ca454040458aab223e1090e5f1}{uqcjvUYfSwK-paAxpuwzGA}{dLZWtCQITqa1GwOFQIgelA}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir}, failureCountSinceLastSuccess=1, [cluster.fault_detection.follower_check.retry_count]=3} disconnected
NodeNotConnectedException[[5b7033ca454040458aab223e1090e5f1][172.xx.xx.xx:9300] Node not connected]
        at org.elasticsearch.transport.ClusterConnectionManager.getConnection(ClusterConnectionManager.java:189)
        at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:682)
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:602)
        at org.elasticsearch.cluster.coordination.FollowersChecker$FollowerChecker.handleWakeUp(FollowersChecker.java:326)
        at org.elasticsearch.cluster.coordination.FollowersChecker$FollowerChecker.start(FollowersChecker.java:304)
        at org.elasticsearch.cluster.coordination.FollowersChecker.lambda$setCurrentNodes$3(FollowersChecker.java:155)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:312)
        at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:735)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
        at org.elasticsearch.cluster.coordination.FollowersChecker.setCurrentNodes(FollowersChecker.java:148)
        at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1115)
        at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:288)
        at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:270)
        at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73)
        at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:155)
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:693)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
[2022-10-18T17:00:04,790][DEBUG][o.e.c.c.FollowersChecker ] [684c88cd4e749b366ab49c1f195504da] FollowerChecker{discoveryNode={7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir}, failureCountSinceLastSuccess=1, [cluster.fault_detection.follower_check.retry_count]=3} marking node as faulty

Data node gets added without the connection mapped(non-verifiable through logs)

[2022-10-18T17:00:21,139][INFO ][o.e.c.s.ClusterApplierService] [684c88cd4e749b366ab49c1f195504da] added {{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir},{5b7033ca454040458aab223e1090e5f1}{uqcjvUYfSwK-paAxpuwzGA}{dLZWtCQITqa1GwOFQIgelA}

Data node gets removed

[2022-10-18T17:00:26,326][INFO ][o.e.c.s.MasterService    ] [684c88cd4e749b366ab49c1f195504da] node-left[{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir} reason: disconnected
[2022-10-18T17:00:56,363][INFO ][o.e.c.s.ClusterApplierService] [684c88cd4e749b366ab49c1f195504da] removed {{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir},

Expected behavior
node join and leaves shouldn't interfere and allow transitions to happens cleanly without getting deadlocked

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions