Skip to content

[BUG] SegmentReplicationUsingRemoteStoreIT.testCancelPrimaryAllocation flaky test failure #10025

@dreamer-89

Description

@dreamer-89

Coming from #8279 (comment), SegmentReplicationUsingRemoteStoreIT.testCancelPrimaryAllocation is flaky.

Gradle report: https://build.ci.opensearch.org/job/gradle-check/25111/testReport/
Build with test failures: (24228,24358,24612,24686,25111)

Assertion trip

java.lang.AssertionError: Expected search hits on node: node_t3 to be at least 1 but was: 0
	at __randomizedtesting.SeedInfo.seed([8477D22ECF559973:53F3D3D66092887]:0)
	at org.junit.Assert.fail(Assert.java:89)
	at org.opensearch.indices.replication.SegmentReplicationBaseIT.lambda$waitForSearchableDocs$0(SegmentReplicationBaseIT.java:124)
	at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1086)
	at org.opensearch.indices.replication.SegmentReplicationBaseIT.waitForSearchableDocs(SegmentReplicationBaseIT.java:119)
	at org.opensearch.indices.replication.SegmentReplicationBaseIT.waitForSearchableDocs(SegmentReplicationBaseIT.java:114)
	at org.opensearch.indices.replication.SegmentReplicationBaseIT.waitForSearchableDocs(SegmentReplicationBaseIT.java:131)

Gradle run logs shows AlreadyClosedException exception on engine refresh.

[2023-09-08T17:43:03,934][ERROR][o.o.i.s.RemoteStoreRefreshListener] [node_t2] [test-idx-1][0] Exception in runAfterRefreshExactlyOnce() method
org.apache.lucene.store.AlreadyClosedException: engine is closed
	at org.opensearch.index.shard.IndexShard.getEngine(IndexShard.java:3452) ~[main/:?]
	at org.opensearch.index.shard.IndexShard.getSegmentInfosSnapshot(IndexShard.java:4944) ~[main/:?]
	at org.opensearch.index.shard.RemoteStoreRefreshListener.runAfterRefreshExactlyOnce(RemoteStoreRefreshListener.java:133) [main/:?]
	at org.opensearch.index.shard.CloseableRetryableRefreshListener.afterRefresh(CloseableRetryableRefreshListener.java:62) [main/:?]
	at org.apache.lucene.search.ReferenceManager.notifyRefreshListenersRefreshed(ReferenceManager.java:275) [lucene-core-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:182) [lucene-core-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) [lucene-core-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1769) [main/:?]
	at org.opensearch.index.engine.InternalEngine.flush(InternalEngine.java:1884) [main/:?]
	at org.opensearch.index.engine.Engine.flush(Engine.java:1198) [main/:?]
	at org.opensearch.index.engine.Engine.flushAndClose(Engine.java:1973) [main/:?]
	at org.opensearch.index.shard.IndexShard.close(IndexShard.java:1938) [main/:?]
	at org.opensearch.index.IndexService.closeShard(IndexService.java:630) [main/:?]
	at org.opensearch.index.IndexService.removeShard(IndexService.java:606) [main/:?]
	at org.opensearch.index.IndexService.close(IndexService.java:380) [main/:?]
	at org.opensearch.indices.IndicesService.removeIndex(IndicesService.java:1019) [main/:?]
	at org.opensearch.indices.cluster.IndicesClusterStateService.removeIndices(IndicesClusterStateService.java:442) [main/:?]
	at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:283) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:606) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:593) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:561) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:484) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:186) [main/:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [main/:?]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) [main/:?]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) [main/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at java.lang.Thread.run(Thread.java:1623) [?:?]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Indexing:ReplicationIssues and PRs related to core replication framework eg segrepStorageIssues and PRs relating to data and metadata storagebugSomething isn't workingflaky-testRandom test failure that succeeds on second rununtriaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions