Skip to content

Fix flaky testShardIndexingPressureTrackingDuringBulkWrites#20631

Merged
andrross merged 1 commit intoopensearch-project:mainfrom
andrross:flaky-ShardIndexingPressureIT
Feb 16, 2026
Merged

Fix flaky testShardIndexingPressureTrackingDuringBulkWrites#20631
andrross merged 1 commit intoopensearch-project:mainfrom
andrross:flaky-ShardIndexingPressureIT

Conversation

@andrross
Copy link
Copy Markdown
Member

@andrross andrross commented Feb 14, 2026

The bug was that in the case of usePrimaryAsCoordinatingNode == false then the test was racing with the second request. The fix is to poll for the second request to make it to the primary shard for indexing. At this point we can assert on the memory tracking for the coordinator regardless of whether the primary or replica was the coordinator.

Resolves #15830

Check List

  • Functionality includes testing.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 14, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Refactors a test method in ShardIndexingPressureIT to consolidate conditional branching by moving the if statement inside a single assertBusy block instead of maintaining two separate assertBusy invocations, while preserving all assertion logic.

Changes

Cohort / File(s) Summary
Test Flakiness Fix
server/src/internalClusterTest/java/org/opensearch/index/ShardIndexingPressureIT.java
Consolidates conditional branching in test assertions by unifying two separate assertBusy blocks into a single assertBusy with an internal if statement for primary vs. replica node handling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested labels

flaky-test, >test-failure, Indexing

🚥 Pre-merge checks | ✅ 4 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (18 files):

⚔️ server/src/internalClusterTest/java/org/opensearch/index/ShardIndexingPressureIT.java (content)
⚔️ server/src/main/java/org/opensearch/action/bulk/BulkPrimaryExecutionContext.java (content)
⚔️ server/src/main/java/org/opensearch/action/bulk/BulkShardResponse.java (content)
⚔️ server/src/main/java/org/opensearch/action/bulk/TransportBulkAction.java (content)
⚔️ server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (content)
⚔️ server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java (content)
⚔️ server/src/main/java/org/opensearch/cluster/routing/IndexShardRoutingTable.java (content)
⚔️ server/src/main/java/org/opensearch/common/settings/ClusterSettings.java (content)
⚔️ server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java (content)
⚔️ server/src/main/java/org/opensearch/search/DefaultSearchContext.java (content)
⚔️ server/src/main/java/org/opensearch/search/SearchService.java (content)
⚔️ server/src/main/java/org/opensearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java (content)
⚔️ server/src/main/java/org/opensearch/search/aggregations/bucket/terms/TermsAggregatorFactory.java (content)
⚔️ server/src/main/java/org/opensearch/search/internal/SearchContext.java (content)
⚔️ server/src/test/java/org/opensearch/action/bulk/TransportBulkActionTests.java (content)
⚔️ server/src/test/java/org/opensearch/search/aggregations/FactoryStreamingCostEstimationTests.java (content)
⚔️ server/src/test/java/org/opensearch/search/aggregations/bucket/terms/TermsAggregatorTests.java (content)
⚔️ test/framework/src/main/java/org/opensearch/search/aggregations/AggregatorTestCase.java (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the specific test being fixed and the issue being addressed (flakiness), directly relating to the code change.
Linked Issues check ✅ Passed The PR addresses issue #15830 by fixing the flaky testShardIndexingPressureTrackingDuringBulkWrites test through wrapping both conditional branches in assertBusy() to handle race conditions in memory tracking.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the identified flakiness in testShardIndexingPressureTrackingDuringBulkWrites; no unrelated modifications are present.
Description check ✅ Passed The pull request description explains the bug fix and references the resolved issue, but lacks detail about test coverage and other required sections.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@andrross andrross marked this pull request as draft February 14, 2026 19:17
@andrross
Copy link
Copy Markdown
Member Author

andrross commented Feb 14, 2026

I think I found the actual problem...

updated now so it's ready for review

@andrross andrross force-pushed the flaky-ShardIndexingPressureIT branch from 88214f4 to 1992fdd Compare February 14, 2026 19:57
@andrross andrross marked this pull request as ready for review February 14, 2026 19:57
The bug was that in the case of usePrimaryAsCoordinatingNode == false
then the test was racing with the second request. The fix is to poll for
the second request to make it to the primary shard for indexing. At this
point we can assert on the memory tracking for the coordinator
regardless of whether the primary or replica was the coordinator.

Signed-off-by: Andrew Ross <andrross@amazon.com>
@andrross andrross force-pushed the flaky-ShardIndexingPressureIT branch from 1992fdd to 62502d4 Compare February 14, 2026 19:58
@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 62502d4: SUCCESS

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.26%. Comparing base (5358bba) to head (62502d4).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20631      +/-   ##
============================================
- Coverage     73.35%   73.26%   -0.10%     
+ Complexity    72033    71918     -115     
============================================
  Files          5781     5781              
  Lines        329200   329200              
  Branches      47491    47491              
============================================
- Hits         241501   241186     -315     
- Misses        68308    68643     +335     
+ Partials      19391    19371      -20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrross andrross merged commit 7cdfa46 into opensearch-project:main Feb 16, 2026
35 checks passed
@andrross andrross deleted the flaky-ShardIndexingPressureIT branch February 16, 2026 16:44
tanyabti pushed a commit to tanyabti/OpenSearch that referenced this pull request Feb 24, 2026
…ch-project#20631)

The bug was that in the case of usePrimaryAsCoordinatingNode == false
then the test was racing with the second request. The fix is to poll for
the second request to make it to the primary shard for indexing. At this
point we can assert on the memory tracking for the coordinator
regardless of whether the primary or replica was the coordinator.

Signed-off-by: Andrew Ross <andrross@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut flaky-test Random test failure that succeeds on second run skip-changelog >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for ShardIndexingPressureIT

2 participants