Describe the bug
Issue described in #3557 is still there also with opensearch 1.3.4.
In some circumstances, when running a search operation, if the client TCP connection is closed before the search operation is completed, opensearch crashes with:
[2022-07-20T17:51:36,145][ERROR][org.opensearch.bootstrap.OpenSearchUncaughtExceptionHandler] fatal error in thread [opensearch[mycluster-elasticsearch-3][search][T#8]], exiting
java.lang.AssertionError: unexpected higher total ops [15] compared to expected [14]
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:465) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$2(AbstractSearchAsyncAction.java:303) ~[opensearch-1.3.4.jar:1.3.4]
The problem was supposedly solved with #3626 but we are still affected by it. The fix addresses the problem when totalOps == expectedTotalOps but not when totalOps > expectedTotalOps, when the exception is still unhandled and lead to a crash.
To Reproduce
Steps to reproduce the behaviour:
We can consistently reproduce this behaviour in several environments.
- run the long running search query, e.g. curl -s "http://localhost:9200/myindex*/_search (this is the same with any client/library)
- kill the client connection after a couple of seconds, e.g stop the curl command with ctrl+c (same as a connection that is closed by the client after a timeout, before the reply from the OS node)
- the opensearch node crashes
Expected behavior
No crashes.
Plugins
- repository-s3
- repository-gcs
Screenshots
no applicable
Host/Environment (please complete the following information):
- OS: Red Hat Enterprise Linux 8 (in a container, UBI minimal)
- Version: Opensearch 1.3.4
Additional context
more logs:
[2022-07-20T17:51:36,145][ERROR][org.opensearch.bootstrap.OpenSearchUncaughtExceptionHandler] fatal error in thread [opensearch[mycluster-elasticsearch-3][search][T#8]], exiting
java.lang.AssertionError: unexpected higher total ops [15] compared to expected [14]
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:465) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$2(AbstractSearchAsyncAction.java:303) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:338) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:57) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) ~[opensearch-1.3.4.jar:1.3.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: org.opensearch.action.search.SearchPhaseExecutionException: Shard failures
... 10 more
Caused by: org.opensearch.tasks.TaskCancelledException: The parent task was cancelled, shouldn't start any child tasks
at org.opensearch.tasks.TaskManager$CancellableTaskHolder.registerChildNode(TaskManager.java:534) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.tasks.TaskManager.registerChildNode(TaskManager.java:226) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.TransportService.sendRequest(TransportService.java:731) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:838) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:826) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.SearchTransportService.sendExecuteQuery(SearchTransportService.java:196) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.SearchQueryThenFetchAsyncAction.executePhaseOnShard(SearchQueryThenFetchAsyncAction.java:124) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$3(AbstractSearchAsyncAction.java:278) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction$PendingExecutions.tryRun(AbstractSearchAsyncAction.java:781) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.performPhaseOnShard(AbstractSearchAsyncAction.java:310) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.run(AbstractSearchAsyncAction.java:249) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:415) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.start(AbstractSearchAsyncAction.java:215) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.TransportSearchAction$5.run(TransportSearchAction.java:1131) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:415) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:409) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:685) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:458) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.access$100(AbstractSearchAsyncAction.java:82) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:291) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:72) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.TransportService.sendRequest(TransportService.java:772) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:838) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.SearchTransportService.sendCanMatch(SearchTransportService.java:149) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.CanMatchPreFilterSearchPhase.executePhaseOnShard(CanMatchPreFilterSearchPhase.java:128) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$3(AbstractSearchAsyncAction.java:278) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.performPhaseOnShard(AbstractSearchAsyncAction.java:312) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:469) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction.access$100(AbstractSearchAsyncAction.java:82) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:291) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:72) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.TransportService$6.handleException(TransportService.java:742) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:308) ~[?:?]
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1357) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:415) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:296) ~[opensearch-1.3.4.jar:1.3.4]
TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]
at org.opensearch.tasks.TaskManager$CancellableTaskHolder.registerChildNode(TaskManager.java:534)
at org.opensearch.tasks.TaskManager.registerChildNode(TaskManager.java:226)
at org.opensearch.action.support.TransportAction.registerChildNode(TransportAction.java:70)
at org.opensearch.action.support.TransportAction.execute(TransportAction.java:89)
at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:108)
at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:95)
at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:433)
at org.opensearch.client.support.AbstractClient.search(AbstractClient.java:569)
at org.opensearch.action.search.TransportMultiSearchAction.executeSearch(TransportMultiSearchAction.java:175)
at org.opensearch.action.search.TransportMultiSearchAction$1.handleResponse(TransportMultiSearchAction.java:198)
at org.opensearch.action.search.TransportMultiSearchAction$1.onFailure(TransportMultiSearchAction.java:183)
at org.opensearch.action.support.TransportAction$1.onFailure(TransportAction.java:112)
at org.opensearch.action.ActionListener$5.onFailure(ActionListener.java:273)
at org.opensearch.action.search.AbstractSearchAsyncAction.raisePhaseFailure(AbstractSearchAsyncAction.java:676)
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:650)
at org.opensearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:126)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:57)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
at org.opensearch.transport.InboundHandler.handleException(InboundHandler.java:413) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:405) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:153) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:109) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:759) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:170) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:145) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:110) ~[opensearch-1.3.4.jar:1.3.4]
at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:94) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1373) ~[?:?]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1236) ~[?:?]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1285) ~[?:?]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510) ~[?:?]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449) ~[?:?]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
... 1 more
TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]
at org.opensearch.tasks.TaskManager$CancellableTaskHolder.registerChildNode(TaskManager.java:534)
at org.opensearch.tasks.TaskManager.registerChildNode(TaskManager.java:226)
at org.opensearch.action.support.TransportAction.registerChildNode(TransportAction.java:70)
at org.opensearch.action.support.TransportAction.execute(TransportAction.java:89)
at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:108)
at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:95)
at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:433)
at org.opensearch.client.support.AbstractClient.search(AbstractClient.java:569)
at org.opensearch.action.search.TransportMultiSearchAction.executeSearch(TransportMultiSearchAction.java:175)
at org.opensearch.action.search.TransportMultiSearchAction$1.handleResponse(TransportMultiSearchAction.java:198)
at org.opensearch.action.search.TransportMultiSearchAction$1.onFailure(TransportMultiSearchAction.java:183)
at org.opensearch.action.support.TransportAction$1.onFailure(TransportAction.java:112)
at org.opensearch.action.ActionListener$5.onFailure(ActionListener.java:273)
at org.opensearch.action.search.AbstractSearchAsyncAction.raisePhaseFailure(AbstractSearchAsyncAction.java:676)
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:650)
at org.opensearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:126)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:57)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]
at org.opensearch.tasks.TaskManager$CancellableTaskHolder.registerChildNode(TaskManager.java:534)
at org.opensearch.tasks.TaskManager.registerChildNode(TaskManager.java:226)
at org.opensearch.action.support.TransportAction.registerChildNode(TransportAction.java:70)
at org.opensearch.action.support.TransportAction.execute(TransportAction.java:89)
fatal error in thread [opensearch[mycluster-elasticsearch-3][search][T#11]], exiting
at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:108)
at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:95)
at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:433)
at org.opensearch.client.support.AbstractClient.search(AbstractClient.java:569)
at org.opensearch.action.search.TransportMultiSearchAction.executeSearch(TransportMultiSearchAction.java:175)
at org.opensearch.action.search.TransportMultiSearchAction$1.handleResponse(TransportMultiSearchAction.java:198)
at org.opensearch.action.search.TransportMultiSearchAction$1.onFailure(TransportMultiSearchAction.java:183)
at org.opensearch.action.support.TransportAction$1.onFailure(TransportAction.java:112)
at org.opensearch.action.ActionListener$5.onFailure(ActionListener.java:273)
at org.opensearch.action.search.AbstractSearchAsyncAction.raisePhaseFailure(AbstractSearchAsyncAction.java:676)
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:650)
at org.opensearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:126)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:57)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
java.lang.AssertionError: unexpected higher total ops [15] compared to expected [14]
fatal error in thread [opensearch[mycluster-elasticsearch-3][search][T#8]], exiting
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:465)
at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$2(AbstractSearchAsyncAction.java:303)
at org.opensearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:338)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:57)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: Failed to execute phase [query], Shard failures; shardFailures {[RbCmgI_7QdeNxfoPnd-lZQ][myindex-2022.07.20][0]: TransportException[failure to send]; nested: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]; }{[PqhBStI8SFy7PQORdpuyYw][myindex-2022.07.19][0]: TransportException[failure to send]; nested: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]; }{[RtTqNO27TSitOPJX41HW5g][myindex-2022.07.18][0]: TransportException[failure to send]; nested: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]; }{[RtTqNO27TSitOPJX41HW5g][myindex-2022.07.17][0]: TransportException[failure to send]; nested: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]; }{[RbCmgI_7QdeNxfoPnd-lZQ][myindex-2022.07.16][0]: TransportException[failure to send]; nested: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]; }{[wSteezs3T3qBVWaXUJQwTg][myindex-2022.07.15][0]: NodeNotConnectedException[[mycluster-elasticsearch-2][172.16.19.33:9300] Node not connected]}{[wSteezs3T3qBVWaXUJQwTg][myindex-2022.07.14][0]: NodeNotConnectedException[[mycluster-elasticsearch-2][172.16.19.33:9300] Node not connected]}{[wSteezs3T3qBVWaXUJQwTg][myindex-2022.07.13][0]: NodeNotConnectedException[[mycluster-elasticsearch-2][172.16.19.33:9300] Node not connected]}{[RbCmgI_7QdeNxfoPnd-lZQ][myindex-2022.07.12][0]: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]}
... 10 more
Caused by: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]
at org.opensearch.tasks.TaskManager$CancellableTaskHolder.registerChildNode(TaskManager.java:534)
at org.opensearch.tasks.TaskManager.registerChildNode(TaskManager.java:226)
at org.opensearch.transport.TransportService.sendRequest(TransportService.java:731)
at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:838)
at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:826)
at org.opensearch.action.search.SearchTransportService.sendExecuteQuery(SearchTransportService.java:196)
at org.opensearch.action.search.SearchQueryThenFetchAsyncAction.executePhaseOnShard(SearchQueryThenFetchAsyncAction.java:124)
at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$3(AbstractSearchAsyncAction.java:278)
at org.opensearch.action.search.AbstractSearchAsyncAction$PendingExecutions.tryRun(AbstractSearchAsyncAction.java:781)
at org.opensearch.action.search.AbstractSearchAsyncAction.performPhaseOnShard(AbstractSearchAsyncAction.java:310)
at org.opensearch.action.search.AbstractSearchAsyncAction.run(AbstractSearchAsyncAction.java:249)
at org.opensearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:415)
at org.opensearch.action.search.AbstractSearchAsyncAction.start(AbstractSearchAsyncAction.java:215)
at org.opensearch.action.search.TransportSearchAction$5.run(TransportSearchAction.java:1131)
at org.opensearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:415)
at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:409)
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:685)
at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:567)
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResultConsumed(AbstractSearchAsyncAction.java:554)
at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$onShardResult$8(AbstractSearchAsyncAction.java:537)
at org.opensearch.action.search.CanMatchPreFilterSearchPhase$CanMatchSearchPhaseResults.consumeResult(CanMatchPreFilterSearchPhase.java:221)
at org.opensearch.action.search.CanMatchPreFilterSearchPhase$CanMatchSearchPhaseResults.consumeResult(CanMatchPreFilterSearchPhase.java:205)
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:537)
at org.opensearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:282)
at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:57)
at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:42)
at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:67)
at org.opensearch.transport.TransportService$6.handleResponse(TransportService.java:736)
at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleResponse(SecurityInterceptor.java:302)
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1347)
at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:388)
at org.opensearch.transport.InboundHandler.handleResponse(InboundHandler.java:380)
at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:155)
at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:109)
at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:759)
at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:170)
at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:145)
at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:110)
at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:94)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1373)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1236)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1285)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
... 1 more
java.lang.AssertionError: unexpected higher total ops [15] compared to expected [14]
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:465)
at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$2(AbstractSearchAsyncAction.java:303)
at org.opensearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:338)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:57)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: Failed to execute phase [query], Shard failures; shardFailures {[RbCmgI_7QdeNxfoPnd-lZQ][myindex-2022.07.20][0]: TransportException[failure to send]; nested: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]; }{[PqhBStI8SFy7PQORdpuyYw][myindex-2022.07.19][0]: TransportException[failure to send]; nested: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]; }{[RtTqNO27TSitOPJX41HW5g][myindex-2022.07.18][0]: TransportException[failure to send]; nested: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]; }{[RtTqNO27TSitOPJX41HW5g][myindex-2022.07.17][0]: TransportException[failure to send]; nested: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]; }{[RbCmgI_7QdeNxfoPnd-lZQ][myindex-2022.07.15][0]: TransportException[failure to send]; nested: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]; }{[wSteezs3T3qBVWaXUJQwTg][myindex-2022.07.14][0]: NodeNotConnectedException[[mycluster-elasticsearch-2][172.16.19.33:9300] Node not connected]}{[PqhBStI8SFy7PQORdpuyYw][myindex-2022.07.13][0]: TransportException[failure to send]; nested: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]; }{[RbCmgI_7QdeNxfoPnd-lZQ][myindex-2022.07.12][0]: TransportException[failure to send]; nested: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]; }{[RbCmgI_7QdeNxfoPnd-lZQ][myindex-2022.07.16][0]: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]}
... 10 more
Caused by: TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks]
at org.opensearch.tasks.TaskManager$CancellableTaskHolder.registerChildNode(TaskManager.java:534)
at org.opensearch.tasks.TaskManager.registerChildNode(TaskManager.java:226)
at org.opensearch.transport.TransportService.sendRequest(TransportService.java:731)
at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:838)
at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:826)
at org.opensearch.action.search.SearchTransportService.sendExecuteQuery(SearchTransportService.java:196)
at org.opensearch.action.search.SearchQueryThenFetchAsyncAction.executePhaseOnShard(SearchQueryThenFetchAsyncAction.java:124)
at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$3(AbstractSearchAsyncAction.java:278)
at org.opensearch.action.search.AbstractSearchAsyncAction$PendingExecutions.tryRun(AbstractSearchAsyncAction.java:781)
at org.opensearch.action.search.AbstractSearchAsyncAction.performPhaseOnShard(AbstractSearchAsyncAction.java:310)
at org.opensearch.action.search.AbstractSearchAsyncAction.run(AbstractSearchAsyncAction.java:249)
at org.opensearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:415)
at org.opensearch.action.search.AbstractSearchAsyncAction.start(AbstractSearchAsyncAction.java:215)
at org.opensearch.action.search.TransportSearchAction$5.run(TransportSearchAction.java:1131)
at org.opensearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:415)
at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:409)
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:685)
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:458)
at org.opensearch.action.search.AbstractSearchAsyncAction.access$100(AbstractSearchAsyncAction.java:82)
at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:291)
at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:72)
at org.opensearch.transport.TransportService.sendRequest(TransportService.java:772)
at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:838)
at org.opensearch.action.search.SearchTransportService.sendCanMatch(SearchTransportService.java:149)
at org.opensearch.action.search.CanMatchPreFilterSearchPhase.executePhaseOnShard(CanMatchPreFilterSearchPhase.java:128)
at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$3(AbstractSearchAsyncAction.java:278)
at org.opensearch.action.search.AbstractSearchAsyncAction.performPhaseOnShard(AbstractSearchAsyncAction.java:312)
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:469)
at org.opensearch.action.search.AbstractSearchAsyncAction.access$100(AbstractSearchAsyncAction.java:82)
at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:291)
at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:72)
at org.opensearch.transport.TransportService$6.handleException(TransportService.java:742)
at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:308)
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1357)
at org.opensearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:415)
at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:296)
at org.opensearch.transport.InboundHandler.handleException(InboundHandler.java:413)
at org.opensearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:405)
at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:153)
at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:109)
at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:759)
at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:170)
at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:145)
at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:110)
at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:94)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1373)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1236)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1285)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
... 1 more
Describe the bug
Issue described in #3557 is still there also with opensearch 1.3.4.
In some circumstances, when running a search operation, if the client TCP connection is closed before the search operation is completed, opensearch crashes with:
The problem was supposedly solved with #3626 but we are still affected by it. The fix addresses the problem when
totalOps == expectedTotalOpsbut not whentotalOps > expectedTotalOps, when the exception is still unhandled and lead to a crash.To Reproduce
Steps to reproduce the behaviour:
We can consistently reproduce this behaviour in several environments.
Expected behavior
No crashes.
Plugins
Screenshots
no applicable
Host/Environment (please complete the following information):
Additional context
more logs: