-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[BUG] SegmentReplicationUsingRemoteStoreIT.testCancelPrimaryAllocation flaky test failure #10025
Copy link
Copy link
Closed
Labels
Indexing:ReplicationIssues and PRs related to core replication framework eg segrepIssues and PRs related to core replication framework eg segrepStorageIssues and PRs relating to data and metadata storageIssues and PRs relating to data and metadata storagebugSomething isn't workingSomething isn't workingflaky-testRandom test failure that succeeds on second runRandom test failure that succeeds on second rununtriaged
Description
Coming from #8279 (comment), SegmentReplicationUsingRemoteStoreIT.testCancelPrimaryAllocation is flaky.
Gradle report: https://build.ci.opensearch.org/job/gradle-check/25111/testReport/
Build with test failures: (24228,24358,24612,24686,25111)
Assertion trip
java.lang.AssertionError: Expected search hits on node: node_t3 to be at least 1 but was: 0
at __randomizedtesting.SeedInfo.seed([8477D22ECF559973:53F3D3D66092887]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.opensearch.indices.replication.SegmentReplicationBaseIT.lambda$waitForSearchableDocs$0(SegmentReplicationBaseIT.java:124)
at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1086)
at org.opensearch.indices.replication.SegmentReplicationBaseIT.waitForSearchableDocs(SegmentReplicationBaseIT.java:119)
at org.opensearch.indices.replication.SegmentReplicationBaseIT.waitForSearchableDocs(SegmentReplicationBaseIT.java:114)
at org.opensearch.indices.replication.SegmentReplicationBaseIT.waitForSearchableDocs(SegmentReplicationBaseIT.java:131)
Gradle run logs shows AlreadyClosedException exception on engine refresh.
[2023-09-08T17:43:03,934][ERROR][o.o.i.s.RemoteStoreRefreshListener] [node_t2] [test-idx-1][0] Exception in runAfterRefreshExactlyOnce() method
org.apache.lucene.store.AlreadyClosedException: engine is closed
at org.opensearch.index.shard.IndexShard.getEngine(IndexShard.java:3452) ~[main/:?]
at org.opensearch.index.shard.IndexShard.getSegmentInfosSnapshot(IndexShard.java:4944) ~[main/:?]
at org.opensearch.index.shard.RemoteStoreRefreshListener.runAfterRefreshExactlyOnce(RemoteStoreRefreshListener.java:133) [main/:?]
at org.opensearch.index.shard.CloseableRetryableRefreshListener.afterRefresh(CloseableRetryableRefreshListener.java:62) [main/:?]
at org.apache.lucene.search.ReferenceManager.notifyRefreshListenersRefreshed(ReferenceManager.java:275) [lucene-core-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:182) [lucene-core-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) [lucene-core-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1769) [main/:?]
at org.opensearch.index.engine.InternalEngine.flush(InternalEngine.java:1884) [main/:?]
at org.opensearch.index.engine.Engine.flush(Engine.java:1198) [main/:?]
at org.opensearch.index.engine.Engine.flushAndClose(Engine.java:1973) [main/:?]
at org.opensearch.index.shard.IndexShard.close(IndexShard.java:1938) [main/:?]
at org.opensearch.index.IndexService.closeShard(IndexService.java:630) [main/:?]
at org.opensearch.index.IndexService.removeShard(IndexService.java:606) [main/:?]
at org.opensearch.index.IndexService.close(IndexService.java:380) [main/:?]
at org.opensearch.indices.IndicesService.removeIndex(IndicesService.java:1019) [main/:?]
at org.opensearch.indices.cluster.IndicesClusterStateService.removeIndices(IndicesClusterStateService.java:442) [main/:?]
at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:283) [main/:?]
at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:606) [main/:?]
at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:593) [main/:?]
at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:561) [main/:?]
at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:484) [main/:?]
at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:186) [main/:?]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [main/:?]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) [main/:?]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) [main/:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
at java.lang.Thread.run(Thread.java:1623) [?:?]
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Indexing:ReplicationIssues and PRs related to core replication framework eg segrepIssues and PRs related to core replication framework eg segrepStorageIssues and PRs relating to data and metadata storageIssues and PRs relating to data and metadata storagebugSomething isn't workingSomething isn't workingflaky-testRandom test failure that succeeds on second runRandom test failure that succeeds on second rununtriaged