Leveraging segment-global ordinal mapping for efficient terms aggrega…#20683
Leveraging segment-global ordinal mapping for efficient terms aggrega…#20683jainankitk merged 8 commits intoopensearch-project:mainfrom
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughRefactors the term-frequency collection loop in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.
Applied to files:
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: Analyze (java)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, windows-latest)
🔇 Additional comments (1)
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java (1)
218-218: The ordinal mapping pattern at line 218 is sound.GlobalOrdinalsIndexFieldDataexplicitly documents that segmentTermsEnumordinals are designed to work with theOrdinalMapfor global ordinals retrieval. The code is restricted toValuesSource.Bytes.WithOrdinals(keyword and IP fields), where both inverted index and doc values ordinals are lexicographically sorted and compatible. No verification needed—this is a supported framework pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@server/src/main/java/org/opensearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java`:
- Around line 94-95: The duplicate static import of NO_MORE_DOCS causes
ambiguous reference errors in GlobalOrdinalsStringTermsAggregator; remove the
deprecated static import of SortedSetDocValues.NO_MORE_DOCS and keep the static
import from DocIdSetIterator.NO_MORE_DOCS so unqualified uses of NO_MORE_DOCS
(e.g., in methods referencing NO_MORE_DOCS) resolve to
DocIdSetIterator.NO_MORE_DOCS; update the import block to drop the
SortedSetDocValues static import and ensure any references to
SortedSetDocValues.NO_MORE_DOCS are either removed or explicitly qualified to
DocIdSetIterator.NO_MORE_DOCS if needed.
- Around line 211-223: The loop in GlobalOrdinalsStringTermsAggregator now
passes global ordinals into ordCountConsumer, but
LowCardinality.tryPrecomputeAggregationForLeaf still assumes it will receive
segment ordinals and calls mapping.applyAsLong(ord), causing double-mapping;
update LowCardinality.tryPrecomputeAggregationForLeaf to accept the global
ordinal directly (i.e., stop calling mapping.applyAsLong on the ord passed into
the lambda) so it forwards the global ord to incrementBucketDocCount, and ensure
any references to mapping.applyAsLong in that method are removed or gated so
they only run when the incoming ord is a segment ord.
...ava/org/opensearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java
Outdated
Show resolved
Hide resolved
...ava/org/opensearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java
Show resolved
Hide resolved
|
❌ Gradle check result for b7a981d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for a480f25: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20683 +/- ##
============================================
- Coverage 73.32% 73.30% -0.03%
+ Complexity 72064 72009 -55
============================================
Files 5781 5781
Lines 329395 329392 -3
Branches 47525 47524 -1
============================================
- Hits 241536 241448 -88
- Misses 68507 68543 +36
- Partials 19352 19401 +49 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Benchmark ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/6140/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/255/
|
@bowenlan-amzn - Ran the clickbench and I can notice very significant performance improvement for q34 / q35:
|
|
Persistent review updated to latest commit 86a7fad |
…tion Signed-off-by: Ankit Jain <jainankitk@apache.org>
Signed-off-by: Ankit Jain <jainankitk@apache.org>
Signed-off-by: Ankit Jain <jainankitk@apache.org>
Signed-off-by: Ankit Jain <jainankitk@apache.org>
Signed-off-by: Ankit Jain <jainankitk@apache.org>
Signed-off-by: Ankit Jain <jainankitk@apache.org>
86a7fad to
bf39ece
Compare
|
Persistent review updated to latest commit bf39ece |
|
Persistent review updated to latest commit 48cad83 |
48cad83 to
0e0882c
Compare
…rmance" This reverts commit 0e0882c. Signed-off-by: Ankit Jain <jainankitk@apache.org>
|
Persistent review updated to latest commit 0e0882c |
Signed-off-by: Ankit Jain <jainankitk@apache.org>
|
Persistent review updated to latest commit 3e78591 |
…tion
Description
Address the terms aggregation performance regression using segment to global ordinals mapping without reading the terms from disk for per segment collection
Related Issues
Resolves #20626
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.