Skip to content

[Backport 2.19] Fix array_index_out_of_bounds_exception with wildcard and aggregations#20862

Merged
andrross merged 1 commit intoopensearch-project:2.19from
ShawnQiang1:backport/backport-20842-to-2.19
Mar 18, 2026
Merged

[Backport 2.19] Fix array_index_out_of_bounds_exception with wildcard and aggregations#20862
andrross merged 1 commit intoopensearch-project:2.19from
ShawnQiang1:backport/backport-20842-to-2.19

Conversation

@ShawnQiang1
Copy link
Copy Markdown
Contributor

Description

backport of #20842

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ShawnQiang1
Copy link
Copy Markdown
Contributor Author

ShawnQiang1 commented Mar 13, 2026

just applying my fix and tested,a new error happend which is not happend on main version

[2026-03-13T21:51:54,846][WARN ][r.suppressed             ] [runTask-0] path: /test-index/_search, params: {pretty=true, index=test-index}
org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:775)
	at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:395)
	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:815)
	at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:548)
	at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:316)
	at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104)
	at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75)
	at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:766)
	at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1741)
	at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1527)
	at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1641)
	at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1615)
	at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:81)
	at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:75)
	at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70)
	at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:104)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54)
	at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1014)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.opensearch.OpenSearchException$3: Cannot invoke "org.apache.lucene.util.automaton.ByteRunAutomaton.run(byte[], int, int)" because "compiledAutomaton.runAutomaton" is null
	at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:710)
	at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:393)
	... 23 more
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.lucene.util.automaton.ByteRunAutomaton.run(byte[], int, int)" because "compiledAutomaton.runAutomaton" is null
	at org.opensearch.index.mapper.WildcardFieldMapper$WildcardFieldType.lambda$regexpQuery$2(WildcardFieldMapper.java:587)
	at org.opensearch.index.mapper.WildcardFieldMapper$WildcardMatchingQuery$1$1$1.matches(WildcardFieldMapper.java:840)
	at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:295)
	at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:236)
	at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71)
	at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:43)
	at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:339)
	at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:289)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:560)
	at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:355)
	at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:462)
	at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:450)
	at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:432)
	at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:60)
	at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282)
	at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155)
	at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:662)
	at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:726)
	at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:695)
	at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	... 8 more


@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 13, 2026

PR Reviewer Guide 🔍

(Review updated until commit f93ad93)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Null Supplier

When context is null, valueFetcherSupplier is set to null. In the get method at line 889, valueFetcherSupplier.get() is called without a null check. If this code path is reached when valueFetcherSupplier is null (i.e., when the query was constructed without a context), a NullPointerException will be thrown. It should be verified that this code path is never reached when valueFetcherSupplier is null, or a null guard should be added.

final ValueFetcher valueFetcher = valueFetcherSupplier.get();
valueFetcher.setNextReader(context);
Supplier Closure

The valueFetcherSupplier lambda captures fieldType and context from the constructor. If fieldType or context hold references to large objects or mutable state, this could lead to memory retention or unexpected behavior. It should be verified that the captured objects have appropriate lifecycles and that the supplier does not introduce unintended side effects when called multiple times (e.g., during parallel scoring).

this.valueFetcherSupplier = () -> fieldType.valueFetcher(context, context.lookup(), null);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 13, 2026

PR Code Suggestions ✨

Latest suggestions up to f93ad93
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null supplier before invocation

If valueFetcherSupplier is null (when context was null at construction time),
calling valueFetcherSupplier.get() will throw a NullPointerException. A null check
should be added before invoking the supplier, consistent with how the original
valueFetcher null check was handled before calling setNextReader.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [889-890]

+if (valueFetcherSupplier == null) {
+    throw new IllegalStateException("valueFetcherSupplier is null; cannot perform second-phase matching");
+}
 final ValueFetcher valueFetcher = valueFetcherSupplier.get();
 valueFetcher.setNextReader(context);
Suggestion importance[1-10]: 5

__

Why: The suggestion is valid in that valueFetcherSupplier can be null when context is null at construction. However, the code path that reaches valueFetcherSupplier.get() (inside createWeight) is only reached when searchLookup is non-null (since searchLookup is set alongside valueFetcherSupplier), so in practice a NPE may not occur. The suggested fix throws an IllegalStateException rather than doing a null-safe skip, which may not be the best approach but does address a potential issue.

Low

Previous suggestions

Suggestions up to commit bbf7117
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check before supplier invocation

If valueFetcherSupplier is null (when the query was constructed without a
QueryShardContext), calling valueFetcherSupplier.get() will throw a
NullPointerException at query execution time. Add a null-check before calling
valueFetcherSupplier.get() to handle this case safely.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [829-832]

 // Create a new ValueFetcher per thread.
 // ValueFetcher.setNextReader is not thread safe.
+if (valueFetcherSupplier == null) {
+    throw new IllegalStateException("valueFetcherSupplier is null; query was constructed without a QueryShardContext");
+}
 final ValueFetcher valueFetcher = valueFetcherSupplier.get();
 valueFetcher.setNextReader(context);
Suggestion importance[1-10]: 3

__

Why: The scorer path at line 828 already requires searchLookup to be non-null (it calls searchLookup.getLeafSearchLookup), and valueFetcherSupplier is only null when searchLookup is null, so a NPE from valueFetcherSupplier would be preceded by one from searchLookup. Adding an explicit check improves clarity but is not strictly necessary for correctness.

Low
Guard against null supplier before invocation

When valueFetcherSupplier is null (i.e., context == null), calling
valueFetcherSupplier.get() later in get(long leadCost) will throw a
NullPointerException. There should be a null-check guard before invoking
valueFetcherSupplier.get() in the scorer path, or the supplier should be set to a
safe no-op/default when context is null.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [737-743]

+if (context != null) {
+    this.searchLookup = context.lookup();
+    this.valueFetcherSupplier = () -> fieldType.valueFetcher(context, context.lookup(), null);
+} else {
+    this.searchLookup = null;
+    this.valueFetcherSupplier = null;
+}
 
-
Suggestion importance[1-10]: 2

__

Why: The existing_code and improved_code are identical, meaning no actual change is proposed. The concern about null valueFetcherSupplier is valid in theory, but the scorer path (where valueFetcherSupplier.get() is called) is only reached when searchLookup != null, which is only set when context != null — the same condition that sets valueFetcherSupplier. So in practice this is not a real bug.

Low

@ShawnQiang1 ShawnQiang1 marked this pull request as draft March 13, 2026 14:00
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for bbf7117: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

opensearch-project#20842)

Signed-off-by: Shawn Qiang <814238703@qq.com>
(cherry picked from commit 0e2783c)
Signed-off-by: Shawn Qiang <814238703@qq.com>
@ShawnQiang1 ShawnQiang1 force-pushed the backport/backport-20842-to-2.19 branch from bbf7117 to f93ad93 Compare March 17, 2026 14:54
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit f93ad93

@ShawnQiang1 ShawnQiang1 marked this pull request as ready for review March 17, 2026 15:08
@ShawnQiang1
Copy link
Copy Markdown
Contributor Author

@andrross now this fix works perfectly, pls help merge this

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for f93ad93: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for f93ad93: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ShawnQiang1
Copy link
Copy Markdown
Contributor Author

man ! help merge @andrross

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for f93ad93: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ShawnQiang1
Copy link
Copy Markdown
Contributor Author

We really need to optimize this workflow.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for f93ad93: SUCCESS

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.93%. Comparing base (e582a51) to head (f93ad93).
⚠️ Report is 5 commits behind head on 2.19.

Files with missing lines Patch % Lines
...g/opensearch/index/mapper/WildcardFieldMapper.java 50.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               2.19   #20862      +/-   ##
============================================
- Coverage     71.95%   71.93%   -0.03%     
+ Complexity    65989    65951      -38     
============================================
  Files          5342     5342              
  Lines        307360   307392      +32     
  Branches      44857    44862       +5     
============================================
- Hits         221167   221122      -45     
- Misses        67736    67812      +76     
- Partials      18457    18458       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ShawnQiang1
Copy link
Copy Markdown
Contributor Author

help merge this @andrross

@andrross andrross merged commit aff3489 into opensearch-project:2.19 Mar 18, 2026
58 of 65 checks passed
@ShawnQiang1 ShawnQiang1 deleted the backport/backport-20842-to-2.19 branch March 18, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants