Skip to content

Disable concurrent search for filter duplicates in significant_text#20857

Merged
andrross merged 1 commit into
opensearch-project:mainfrom
andrross:DuplicateByteSequenceSpotter-concurrent-search
Mar 17, 2026
Merged

Disable concurrent search for filter duplicates in significant_text#20857
andrross merged 1 commit into
opensearch-project:mainfrom
andrross:DuplicateByteSequenceSpotter-concurrent-search

Conversation

@andrross
Copy link
Copy Markdown
Member

The underlying structured used to find duplicate byte sequences (DuplicateByteSequenceSpotter) is stateful and not thread-safe, so disable concurrent search if this particular feature is being used.

Related Issues

Part of #14408, specifically {yaml=search.aggregation/90_sig_text/Dedup noise}

Check List

  • Functionality includes testing.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 13, 2026

PR Reviewer Guide 🔍

(Review updated until commit 400dcbb)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Thread Safety

The fix disables concurrent search when filterDuplicateText is true, but it's worth validating that filterDuplicateText is correctly set and that there are no other shared stateful structures in the aggregator that could also be non-thread-safe when concurrent search is enabled (i.e., when filterDuplicateText is false).

return filterDuplicateText == false;

@andrross andrross force-pushed the DuplicateByteSequenceSpotter-concurrent-search branch from 39b62ab to cca1a3b Compare March 13, 2026 02:41
@github-actions
Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Use idiomatic boolean negation operator

Using == false for boolean negation is unconventional in Java. The idiomatic and
more readable way to negate a boolean is to use the ! operator. This improves code
clarity and follows standard Java conventions.

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/SignificantTextAggregatorFactory.java [321]

-return filterDuplicateText == false;
+return !filterDuplicateText;
Suggestion importance[1-10]: 4

__

Why: The suggestion is valid - using !filterDuplicateText is more idiomatic Java than filterDuplicateText == false. However, this is a minor style improvement with no functional impact.

Low

@andrross
Copy link
Copy Markdown
Member Author

@msfroh If you don't mind I'd appreciate a look to see if this seems correct

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit cca1a3b

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cca1a3b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cca1a3b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cca1a3b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cca1a3b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cca1a3b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cca1a3b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cca1a3b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

The underlying structured used to find duplicate byte sequences
(DuplicateByteSequenceSpotter) is stateful and not thread-safe, so
disable concurrent search if this particular feature is being used.

Signed-off-by: Andrew Ross <andrross@amazon.com>
@andrross andrross force-pushed the DuplicateByteSequenceSpotter-concurrent-search branch from cca1a3b to 400dcbb Compare March 17, 2026 19:19
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 400dcbb

@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for 400dcbb: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@prudhvigodithi
Copy link
Copy Markdown
Member

We can explore supporting DuplicateByteSequenceSpotter thread safe as an enhancement to ensure significant_text aggs with filter_duplicate_text get benefit with concurrent segment search.

@andrross andrross merged commit f46ce23 into opensearch-project:main Mar 17, 2026
33 checks passed
@andrross andrross deleted the DuplicateByteSequenceSpotter-concurrent-search branch March 17, 2026 21:08
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.28%. Comparing base (28fa177) to head (400dcbb).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...bucket/terms/SignificantTextAggregatorFactory.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20857      +/-   ##
============================================
- Coverage     73.32%   73.28%   -0.05%     
- Complexity    72272    72318      +46     
============================================
  Files          5797     5802       +5     
  Lines        330323   330405      +82     
  Branches      47676    47686      +10     
============================================
- Hits         242215   242123      -92     
- Misses        68663    68903     +240     
+ Partials      19445    19379      -66     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Override
protected boolean supportsConcurrentSegmentSearch() {
return true;
// The underlying structured used to find duplicate byte sequences (DuplicateByteSequenceSpotter)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is unclear why DuplicateByteSequenceSpotter needs to be threadsafe for inter-segment concurrent search. I see it being created at per leaf (or segment) and owning ord combination here

aparajita31pandey pushed a commit to aparajita31pandey/OpenSearch that referenced this pull request Apr 18, 2026
…pensearch-project#20857)

The underlying structured used to find duplicate byte sequences
(DuplicateByteSequenceSpotter) is stateful and not thread-safe, so
disable concurrent search if this particular feature is being used.

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Aparajita Pandey <aparajita31pandey@gmail.com>
pradeep-L pushed a commit to pradeep-L/OpenSearch that referenced this pull request Apr 21, 2026
…pensearch-project#20857)

The underlying structured used to find duplicate byte sequences
(DuplicateByteSequenceSpotter) is stateful and not thread-safe, so
disable concurrent search if this particular feature is being used.

Signed-off-by: Andrew Ross <andrross@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants