Skip to content

[BUG] Test org.opensearch.indices.replication.SegmentReplicationSuiteIT is flaky #9499

@sachinpkale

Description

@sachinpkale

Using the same seed does not always fail the test. We need to run the test multiple times to get the failure (On my local, I got it on the 13th retry)

Build where it failed: https://build.ci.opensearch.org/job/gradle-check/23203/

I was able to reproduce with main

  2> java.lang.IllegalStateException: Some shards are still open after the threadpool terminated. Something is leaking index readers or store references.
        at __randomizedtesting.SeedInfo.seed([CFC3DCBFE313A077]:0)
        at org.opensearch.node.Node.awaitClose(Node.java:1541)
        at org.opensearch.test.InternalTestCluster$NodeAndClient.close(InternalTestCluster.java:1129)
        at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:89)
        at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:131)
        at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:114)
        at org.opensearch.test.InternalTestCluster.close(InternalTestCluster.java:966)
        at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:89)
        at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:131)
        at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:114)
        at org.opensearch.test.OpenSearchIntegTestCase.clearClusters(OpenSearchIntegTestCase.java:576)
        at org.opensearch.test.OpenSearchIntegTestCase.afterClass(OpenSearchIntegTestCase.java:2283)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:578)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:901)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
        at java.base/java.lang.Thread.run(Thread.java:1623)
  2> REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.indices.replication.SegmentReplicationSuiteIT" -Dtests.seed=CFC3DCBFE313A077 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-US -Dtests.timezone=UTC -Druntime.java=20
Tests with failures:
 - org.opensearch.indices.replication.SegmentReplicationSuiteIT.testFullRestartDuringReplication
 - org.opensearch.indices.replication.SegmentReplicationSuiteIT.testDropRandomNodeDuringReplication
 - org.opensearch.indices.replication.SegmentReplicationSuiteIT.testDeleteIndexWhileReplicating
 - org.opensearch.indices.replication.SegmentReplicationSuiteIT.testBasicReplication
 - org.opensearch.indices.replication.SegmentReplicationSuiteIT.classMethod

Metadata

Metadata

Assignees

Labels

Indexing:ReplicationIssues and PRs related to core replication framework eg segrepbugSomething isn't workingflaky-testRandom test failure that succeeds on second runv2.11.0Issues and PRs related to version 2.11.0

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions