fix flakey test testDecommissionNodeNoReplicas#18537
Merged
andrross merged 1 commit intoJun 17, 2025
Merged
Conversation
Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
Contributor
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18537 +/- ##
============================================
- Coverage 72.77% 72.72% -0.06%
+ Complexity 68170 68116 -54
============================================
Files 5540 5540
Lines 313384 313385 +1
Branches 45473 45474 +1
============================================
- Hits 228051 227894 -157
- Misses 66811 66993 +182
+ Partials 18522 18498 -24 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
andrross
approved these changes
Jun 17, 2025
neuenfeldttj
pushed a commit
to neuenfeldttj/OpenSearch
that referenced
this pull request
Jun 26, 2025
Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>Signed-off-by: TJ Neuenfeldt <tjneu@amazon.com>
neuenfeldttj
pushed a commit
to neuenfeldttj/OpenSearch
that referenced
this pull request
Jun 26, 2025
Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
tandonks
pushed a commit
to tandonks/OpenSearch
that referenced
this pull request
Aug 5, 2025
Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
This was referenced Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
I first found this flakey test in link, and I added some logs in the branch for analysis.
Reproduce
There may be 1 to 2 failures per 100 tests.
Analysis
After repeated verification, the test will only fail when condition
closed == true.Under this condition, it will first close the index
testand then excludenode_1. When all shards has been migrated tonode_0, the indextestwill be opened. In rare cases, anNPEwill be generated after opening the indextest.I found that in
InternalPrimaryBatchShardAllocator#fetchData, it is possible to return aAsyncShardFetch.FetchResultwhere theAsyncShardFetch.FetchResult#getDatacontains aDiscoveryNodeto an emptyNodeGatewayStartedShardsBatch#nodeGatewayStartedShardsBatchMap.In
AsyncShardFetch#fetchData, information is asynchronously collected throughTransportNodesListGatewayStartedShardsBatchand placed inAsyncShardFetch#cache, and thenfetchDatais retrieved from theAsyncShardFetch#cache.In
TransportNodesListGatewayStartedShardsBatch#nodeOperation, ifTransportNodesGatewayStartedShardHelper#getShardInfoOnLocalNodethrows an exception, the returnedshardsOnNodemay be a mapping from shardId tonullvalue. This will cause the operationthis.emptyShardResponse[shardIdKey.get(shardId)] = trueto be skipped in methodAsyncShardBatchFetch.ShardBatchCache.NodeEntry#fillShardData.Finally, when getting the
fetchDatafrom theAsyncShardFetch#cache,AsyncShardBatchFetch.ShardBatchCache#getBatchDatawill be called and an empty map will be returned. This causes aNPEinPrimaryShardBatchAllocator#adaptToNodeShardStates.In summary, since we cannot ensure that no exceptions occur in
TransportNodesListGatewayStartedShardsBatch#nodeOperation, we need to handlenullvalues inPrimaryShardBatchAllocator#adaptToNodeShardStates.Related Issues
Resolves #[18310]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.