[BUG] Race in node-left and node-join can prevent node from joining the cluster indefinitely

**Describe the bug**
A node in an OpenSearch cluster can fail due to many reasons(health-check failure/lagging etc)which triggers a `node-left` cluster state update on the leader which is responsible for removing the connections as a part of applying the new(node-left) cluster state. However it is possible that a data node triggers a `node-join` quickly(before the leader has removed the connection as a part of cluster state apply) that reuses the connection and attempts to connect to the leader. At this point the leader starts to process a `node-join` request, updates its followers and schedules a follower checker on the newly joined node. But before the follower checker can get to this connection the in-flight `node-left` cluster state updates cleans up the connection only for the follower checker to realize the node is not connected thereby failing the follower checks and triggering another `node-left`. The leader doesn't even send the `node-join` cluster state to the data node since it thinks it isn't connected as a result the peer finder keeps on sending join request to the leader and the loop goes on.....

####  Node join
```
[2022-10-18T17:00:04,691][INFO ][o.e.c.s.MasterService    ] [684c88cd4e749b366ab49c1f195504da] node-join[{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir} join existing leader
```


#### Follower checker scheduled post `node-join` which fails to find a connection
```
[2022-10-18T17:00:04,788][DEBUG][o.e.c.c.FollowersChecker ] [684c88cd4e749b366ab49c1f195504da] FollowerChecker{discoveryNode={5b7033ca454040458aab223e1090e5f1}{uqcjvUYfSwK-paAxpuwzGA}{dLZWtCQITqa1GwOFQIgelA}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir}, failureCountSinceLastSuccess=1, [cluster.fault_detection.follower_check.retry_count]=3} disconnected
NodeNotConnectedException[[5b7033ca454040458aab223e1090e5f1][172.xx.xx.xx:9300] Node not connected]
        at org.elasticsearch.transport.ClusterConnectionManager.getConnection(ClusterConnectionManager.java:189)
        at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:682)
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:602)
        at org.elasticsearch.cluster.coordination.FollowersChecker$FollowerChecker.handleWakeUp(FollowersChecker.java:326)
        at org.elasticsearch.cluster.coordination.FollowersChecker$FollowerChecker.start(FollowersChecker.java:304)
        at org.elasticsearch.cluster.coordination.FollowersChecker.lambda$setCurrentNodes$3(FollowersChecker.java:155)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:312)
        at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:735)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
        at org.elasticsearch.cluster.coordination.FollowersChecker.setCurrentNodes(FollowersChecker.java:148)
        at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1115)
        at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:288)
        at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:270)
        at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73)
        at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:155)
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:693)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
```

```
[2022-10-18T17:00:04,790][DEBUG][o.e.c.c.FollowersChecker ] [684c88cd4e749b366ab49c1f195504da] FollowerChecker{discoveryNode={7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir}, failureCountSinceLastSuccess=1, [cluster.fault_detection.follower_check.retry_count]=3} marking node as faulty
```

#### Data node gets added without the connection mapped(non-verifiable through logs)
```
[2022-10-18T17:00:21,139][INFO ][o.e.c.s.ClusterApplierService] [684c88cd4e749b366ab49c1f195504da] added {{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir},{5b7033ca454040458aab223e1090e5f1}{uqcjvUYfSwK-paAxpuwzGA}{dLZWtCQITqa1GwOFQIgelA}
```

#### Data node gets removed
```
[2022-10-18T17:00:26,326][INFO ][o.e.c.s.MasterService    ] [684c88cd4e749b366ab49c1f195504da] node-left[{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir} reason: disconnected
```

```
[2022-10-18T17:00:56,363][INFO ][o.e.c.s.ClusterApplierService] [684c88cd4e749b366ab49c1f195504da] removed {{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir},
```

**Expected behavior**
node join and leaves shouldn't interfere and allow transitions to happens cleanly without getting deadlocked

**Plugins**
Please list all plugins currently enabled.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Host/Environment (please complete the following information):**
 - OS: [e.g. iOS]
 - Version [e.g. 22]

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Race in node-left and node-join can prevent node from joining the cluster indefinitely #4874

Node join

Follower checker scheduled post `node-join` which fails to find a connection

Data node gets added without the connection mapped(non-verifiable through logs)

Data node gets removed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Race in node-left and node-join can prevent node from joining the cluster indefinitely #4874

Description

Node join

Follower checker scheduled post node-join which fails to find a connection

Data node gets added without the connection mapped(non-verifiable through logs)

Data node gets removed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Follower checker scheduled post `node-join` which fails to find a connection