-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Closed
Labels
backwards-compatibilitybugSomething isn't workingSomething isn't workingdistributed frameworkv3.0.0Issues and PRs related to version 3.0.0Issues and PRs related to version 3.0.0
Description
Describe the bug
Nodes cannot join back to the cluster after upgrading from 2.0.1 to 3.0.0.
To Reproduce
Failing GHA: https://github.com/cliu123/security/runs/6926965970?check_suite_focus=true
Expected behavior
A clear and concise description of what you expected to happen.
Plugins
Please list all plugins currently enabled.
Error logs
WARN ][o.o.c.NodeConnectionsService] [securityBwcCluster0-2] failed to connect to {securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} (tried [1] times)
» org.opensearch.transport.ConnectTransportException: [securityBwcCluster0-1][127.0.0.1:43131] connect_exception
» at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1076) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:215) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:55) ~[opensearch-core-2.0.1.jar:2.0.1]
» at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
» at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
» at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
» at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
» at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:70) ~[opensearch-core-2.0.1.jar:2.0.1]
» at org.opensearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:81) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[?:?]
» at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) ~[?:?]
» at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) ~[?:?]
» at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710) ~[?:?]
» at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) ~[?:?]
» at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) ~[?:?]
» at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
» at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[?:?]
» at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
» at java.lang.Thread.run(Thread.java:833) [?:?]
» Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 127.0.0.1/127.0.0.1:43131
» Caused by: java.net.ConnectException: Connection refused
» at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
» at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
» at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
» at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[?:?]
» at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[?:?]
» ... 7 more
» ↓ last 40 non error or warning messages from /home/runner/work/security/security/bwc-test/build/testclusters/securityBwcCluster0-2/logs/opensearch.stdout.log ↓
» [2022-06-16T22:10:14,761][INFO ][o.o.n.Node ] [securityBwcCluster0-2] node name [securityBwcCluster0-2], node ID [jjX5m3WkSLe8eAwhuQhn7A], cluster name [securityBwcCluster0], roles [cluster_manager, remote_cluster_client, data, ingest]
» [2022-06-16T22:10:19,058][INFO ][o.o.t.NettyAllocator ] [securityBwcCluster0-2] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=256kb, factors={opensearch.unsafe.use_unpooled_allocator=null, g1gc_enabled=true, g1gc_region_size=1mb, heap_size=512mb}]
» [2022-06-16T22:10:19,129][INFO ][o.o.d.DiscoveryModule ] [securityBwcCluster0-2] using discovery type [zen] and seed hosts providers [settings, file]
» [2022-06-16T22:10:19,577][INFO ][o.o.n.Node ] [securityBwcCluster0-2] initialized
» [2022-06-16T22:10:19,577][INFO ][o.o.n.Node ] [securityBwcCluster0-2] starting ...
» [2022-06-16T22:10:19,727][INFO ][o.o.t.TransportService ] [securityBwcCluster0-2] publish_address {127.0.0.1:42529}, bound_addresses {[::1]:37963}, {127.0.0.1:42529}
» [2022-06-16T22:10:19,932][DEBUG][o.o.c.c.Coordinator ] [securityBwcCluster0-2] startInitialJoin: coordinator becoming CANDIDATE in term 0 (was null, lastKnownLeader was [Optional.empty])
» [2022-06-16T22:10:19,960][INFO ][o.o.h.AbstractHttpServerTransport] [securityBwcCluster0-2] publish_address {127.0.0.1:46681}, bound_addresses {[::1]:41935}, {127.0.0.1:46681}
» [2022-06-16T22:10:19,962][INFO ][o.o.n.Node ] [securityBwcCluster0-2] started
» [2022-06-16T22:10:19,962][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-2] Node started
» [2022-06-16T22:10:19,968][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-2] 0 OpenSearch Security modules loaded so far: []
» [2022-06-16T22:10:20,876][INFO ][o.o.c.c.Coordinator ] [securityBwcCluster0-2] setting initial configuration to VotingConfiguration{jjX5m3WkSLe8eAwhuQhn7A,Us2JKk3aSU-lIJVCnUQ02w,{bootstrap-placeholder}-securityBwcCluster0-0}
» [2022-06-16T22:10:21,093][DEBUG][o.o.c.c.ElectionSchedulerFactory] [securityBwcCluster0-2] scheduling scheduleNextElection{gracePeriod=0s, thisAttempt=0, maxDelayMillis=100, delayMillis=22, ElectionScheduler{attempt=1, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
» [2022-06-16T22:10:21,103][DEBUG][o.o.c.c.Coordinator ] [securityBwcCluster0-2] joinLeaderInTerm: for [{securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}] with term 1
» [2022-06-16T22:10:21,104][DEBUG][o.o.c.c.CoordinationState] [securityBwcCluster0-2] handleStartJoin: leaving term [0] due to StartJoinRequest{term=1,node={securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}
» [2022-06-16T22:10:21,129][DEBUG][o.o.c.c.ElectionSchedulerFactory] [securityBwcCluster0-2] scheduleNextElection{gracePeriod=0s, thisAttempt=0, maxDelayMillis=100, delayMillis=22, ElectionScheduler{attempt=1, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}} starting election
» [2022-06-16T22:10:21,131][DEBUG][o.o.c.c.ElectionSchedulerFactory] [securityBwcCluster0-2] scheduling scheduleNextElection{gracePeriod=500ms, thisAttempt=1, maxDelayMillis=200, delayMillis=645, ElectionScheduler{attempt=2, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
» [2022-06-16T22:10:21,134][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-2] PreVotingRound{preVotesReceived={}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=1}, isClosed=false} requesting pre-votes from [{securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, {securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, {securityBwcCluster0-0}{MntY5LcxSzCwoWnBav5SUA}{QagsblRtSNihbQhNLBZukQ}{127.0.0.1}{127.0.0.1:46477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}]
» [2022-06-16T22:10:21,157][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-2] PreVotingRound{preVotesReceived={{securityBwcCluster0-0}{MntY5LcxSzCwoWnBav5SUA}{QagsblRtSNihbQhNLBZukQ}{127.0.0.1}{127.0.0.1:46477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0}}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=1}, isClosed=false} added PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0} from {securityBwcCluster0-0}{MntY5LcxSzCwoWnBav5SUA}{QagsblRtSNihbQhNLBZukQ}{127.0.0.1}{127.0.0.1:46477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, no quorum yet
» [2022-06-16T22:10:21,159][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-2] PreVotingRound{preVotesReceived={{securityBwcCluster0-0}{MntY5LcxSzCwoWnBav5SUA}{QagsblRtSNihbQhNLBZukQ}{127.0.0.1}{127.0.0.1:46477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0}, {securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0}}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=1}, isClosed=false} added PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0} from {securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, no quorum yet
» [2022-06-16T22:10:21,150][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-2] TransportResponseHandler{PreVoteCollector{state=Tuple [v1=null, v2=PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0}]}, node={securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}} failed
» org.opensearch.transport.RemoteTransportException: [securityBwcCluster0-1][127.0.0.1:43131][internal:cluster/request_pre_vote]
» Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: rejecting PreVoteRequest{sourceNode={securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=1} as there is already a leader
» at org.opensearch.cluster.coordination.PreVoteCollector.handlePreVoteRequest(PreVoteCollector.java:162) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.cluster.coordination.PreVoteCollector.lambda$new$0(PreVoteCollector.java:100) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:103) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:453) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:798) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.0.1.jar:2.0.1]
» at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
» at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
» at java.lang.Thread.run(Thread.java:833) [?:?]
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
backwards-compatibilitybugSomething isn't workingSomething isn't workingdistributed frameworkv3.0.0Issues and PRs related to version 3.0.0Issues and PRs related to version 3.0.0