Add filter and filters bucket aggregation in dsl-query-executor plugin by tanik98 · Pull Request #21254 · opensearch-project/OpenSearch

tanik98 · 2026-04-17T04:54:20Z

Description

Add DSL-to-Calcite conversion for filter and filters bucket aggregation.

Filter: It filters documents based on embedded condition and creates a single bucket, gets converted to LogicalFilter
Filters: It creates N(N+1 if other_bucket is enabled) buckets based on N filter conditions, gets converted to separate N plans with LogicalFilter

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>

github-actions · 2026-04-17T04:56:08Z

PR Reviewer Guide 🔍

(Review updated until commit `24ed814`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/FilterBucketTranslator.java [76]: Handle query metadata to tag aggregation results, not getting processed in AggregationTreeWalker sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/FiltersBucketTranslator.java [80]: Handle query metadata to tag aggregation results, not getting processed in AggregationTreeWalker sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationRegistryFactory.java [39]: Add other aggregation translators
🔀 Multiple PR themes Sub-PR theme: Add filter bucket aggregation support Relevant files: sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/FilterBucketTranslator.java sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/aggregation/bucket/FilterBucketTranslatorTests.java sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/EmptyGrouping.java server/src/main/java/org/opensearch/search/aggregations/bucket/filter/InternalFilter.java Sub-PR theme: Add filters bucket aggregation support Relevant files: sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/FiltersBucketTranslator.java sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/aggregation/bucket/FiltersBucketTranslatorTests.java Sub-PR theme: Integrate filter/filters into tree walker and converter pipeline Relevant files: sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationTreeWalker.java sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationMetadata.java sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationMetadataBuilder.java sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationRegistryFactory.java sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/BucketTranslator.java sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/converter/AggregateConverter.java sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/converter/SearchSourceConverter.java sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/aggregation/AggregationTreeWalkerTests.java sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/converter/AggregateConverterTests.java
⚡ Recommended focus areas for review Iterator Reuse Bug In `toBucketAggregation`, the `buckets` iterable's iterator is called twice: once to check `hasNext()` and again to call `next()`. If the `Iterable` returns a fresh iterator each time (e.g., a `List`), this works, but if it returns a single-use iterator (e.g., a stream-backed iterable), the second call to `iterator().next()` will fail or return incorrect results. Consider converting to a list or using a single iterator variable. public InternalAggregation toBucketAggregation(FilterAggregationBuilder agg, Iterable<BucketEntry> buckets) { if (!buckets.iterator().hasNext()) { return new InternalFilter(agg.getName(), 0, InternalAggregations.EMPTY, Map.of()); } BucketEntry bucket = buckets.iterator().next(); return new InternalFilter(agg.getName(), bucket.docCount(), bucket.subAggs(), Map.of()); } Null Safety The `bucketOrders` field is now assigned directly from the constructor parameter without defensive copying (changed from `List.copyOf(bucketOrders)` to `this.bucketOrders = bucketOrders`). If a mutable list is passed in, it could be modified externally, breaking immutability guarantees. The `filterCondition` being nullable is fine, but `bucketOrders` should still be defensively copied. this.bucketOrders = bucketOrders; this.filterCondition = filterCondition; Null Dereference Risk In `handleMetric`, when `currentFilterKey` is non-null, the builder is fetched via `granularities.get(currentFilterKey)`. If the filter bucket builder was never created (e.g., due to a logic error or unexpected ordering), `builder` will be null, causing a NullPointerException on the subsequent `addAggregateCall` call. There is no null-check or error handling for this case. AggregationMetadataBuilder builder; if (currentFilterKey != null) { builder = granularities.get(currentFilterKey); } else { builder = getOrCreateBuilder(currentGroupings, granularities); } builder.addAggregateCall(translator.toAggregateCall(aggBuilder, rowType), translator.getAggregateFieldName(aggBuilder)); Visibility Change The constructor of `InternalFilter` has been changed from package-private to `public`. This widens the API surface of a core server class and may break encapsulation. This change should be carefully reviewed to ensure it doesn't expose internal implementation details or allow unintended instantiation from outside the package. public InternalFilter(String name, long docCount, InternalAggregations subAggregations, Map<String, Object> metadata) { super(name, docCount, subAggregations, metadata); } Debug Code The `logRowTypes` method recursively logs the row type of every `RelNode` in the plan tree at INFO level. This appears to be debug/development code that could produce excessive log output in production and should either be removed or guarded behind a DEBUG log level check. private void logRowTypes(RelNode node, int depth) { String indent = " ".repeat(depth); logger.info("{}[{}] rowType: {}", indent, node.getRelTypeName(), node.getRowType()); for (RelNode input : node.getInputs()) { logRowTypes(input, depth + 1); } }

github-actions · 2026-04-17T04:56:46Z

PR Code Suggestions ✨

Latest suggestions up to 24ed814

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Restore defensive copy for mutable list field The `bucketOrders` field was previously defensively copied with `List.copyOf()`, but the new code assigns it directly. This inconsistency could allow external mutation of the internal state. It should remain a defensive copy like the other list fields. sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationMetadata.java [59] -this.bucketOrders = bucketOrders; +this.bucketOrders = List.copyOf(bucketOrders); Suggestion importance[1-10]: 7 __ Why: The `bucketOrders` field was previously assigned with `List.copyOf()` for immutability, but the new code assigns it directly, creating an inconsistency with other list fields and potentially allowing external mutation. This is a valid correctness concern.	Medium
	Avoid double iterator creation on Iterable `buckets.iterator()` is called twice on the same `Iterable`. For non-reusable iterables (e.g., a single-use stream or iterator), the second call to `.iterator().hasNext()` and then `.iterator().next()` may not work correctly or may skip elements. The iterator should be obtained once and reused. sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/FilterBucketTranslator.java [79-85] public InternalAggregation toBucketAggregation(FilterAggregationBuilder agg, Iterable<BucketEntry> buckets) { - if (!buckets.iterator().hasNext()) { + java.util.Iterator<BucketEntry> it = buckets.iterator(); + if (!it.hasNext()) { return new InternalFilter(agg.getName(), 0, InternalAggregations.EMPTY, Map.of()); } - BucketEntry bucket = buckets.iterator().next(); + BucketEntry bucket = it.next(); return new InternalFilter(agg.getName(), bucket.docCount(), bucket.subAggs(), Map.of()); } Suggestion importance[1-10]: 7 __ Why: Calling `buckets.iterator()` twice is a bug for non-reusable iterables — the second call creates a fresh iterator, so `iterator().next()` would return the first element again but only after the first iterator was already consumed. Storing the iterator in a variable and reusing it is the correct approach.	Medium
	Guard against null builder lookup by filter key When `currentFilterKey` is non-null, `granularities.get(currentFilterKey)` may return `null` if the filter bucket builder was not yet created before the metric is processed. This would cause a `NullPointerException` on the `builder.addAggregateCall(...)` call. A null check or a `getOrCreateBuilder` call should be used instead. sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationTreeWalker.java [283-289] if (currentFilterKey != null) { builder = granularities.get(currentFilterKey); + if (builder == null) { + throw new ConversionException("No builder found for filter key: " + currentFilterKey); + } } else { builder = getOrCreateBuilder(currentGroupings, granularities); } builder.addAggregateCall(translator.toAggregateCall(aggBuilder, rowType), translator.getAggregateFieldName(aggBuilder)); Suggestion importance[1-10]: 6 __ Why: When `currentFilterKey` is non-null, `granularities.get(currentFilterKey)` could return `null` if the builder wasn't created before the metric is processed, leading to a `NullPointerException`. Adding a null check with a meaningful error would improve robustness.	Low
General	Prevent unbounded recursion in logging utility This recursive method has no depth limit and could cause a `StackOverflowError` for deeply nested query plans. A maximum depth guard should be added to prevent unbounded recursion in production environments. sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/executor/DslQueryPlanExecutor.java [96-102] +private static final int MAX_LOG_DEPTH = 20; + private void logRowTypes(RelNode node, int depth) { + if (depth > MAX_LOG_DEPTH) { + logger.info("{}[...truncated at depth {}]", " ".repeat(depth), depth); + return; + } String indent = " ".repeat(depth); logger.info("{}[{}] rowType: {}", indent, node.getRelTypeName(), node.getRowType()); for (RelNode input : node.getInputs()) { logRowTypes(input, depth + 1); } } Suggestion importance[1-10]: 3 __ Why: The recursive `logRowTypes` method has no depth limit, but in practice Calcite plan trees are unlikely to be deeply nested enough to cause a `StackOverflowError`. This is a minor defensive improvement for a debug logging utility.	Low

Previous suggestions

Suggestions up to commit 69d2f7f

Category	Suggestion	Impact
General	Restore defensive copy for mutable list field The `bucketOrders` field was previously defensively copied with `List.copyOf()`, but the new code assigns it directly. This inconsistency could allow external mutation of the internal state. It should remain a defensive copy for consistency and safety. sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationMetadata.java [59] -this.bucketOrders = bucketOrders; +this.bucketOrders = List.copyOf(bucketOrders); Suggestion importance[1-10]: 7 __ Why: The `bucketOrders` field was previously assigned with `List.copyOf()` for immutability, but the PR changed it to a direct assignment. This inconsistency could allow external mutation of the internal state, unlike all other list fields which still use `List.copyOf()`.	Medium
Possible issue	Avoid double iterator creation on same iterable `buckets.iterator()` is called twice on the same `Iterable`. For non-reusable iterables (e.g., a single-use stream), the second call to `buckets.iterator().next()` may fail or return incorrect results. The iterator should be obtained once and reused. sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/FilterBucketTranslator.java [79-85] public InternalAggregation toBucketAggregation(FilterAggregationBuilder agg, Iterable<BucketEntry> buckets) { - if (!buckets.iterator().hasNext()) { + java.util.Iterator<BucketEntry> it = buckets.iterator(); + if (!it.hasNext()) { return new InternalFilter(agg.getName(), 0, InternalAggregations.EMPTY, Map.of()); } - BucketEntry bucket = buckets.iterator().next(); + BucketEntry bucket = it.next(); return new InternalFilter(agg.getName(), bucket.docCount(), bucket.subAggs(), Map.of()); } Suggestion importance[1-10]: 7 __ Why: Calling `buckets.iterator()` twice on the same `Iterable` is unsafe for non-reusable iterables (e.g., streams). Storing the iterator once and reusing it prevents potential bugs with single-use iterables.	Medium
Possible issue	Guard against null builder lookup by filter key When `currentFilterKey` is non-null, `granularities.get(currentFilterKey)` may return `null` if the key doesn't exist yet (e.g., if a metric appears before its parent filter bucket is processed). This would cause a `NullPointerException` on the subsequent `builder.addAggregateCall(...)` call. sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationTreeWalker.java [284-289] if (currentFilterKey != null) { builder = granularities.get(currentFilterKey); + if (builder == null) { + throw new ConversionException("No metadata builder found for filter key: " + currentFilterKey); + } } else { builder = getOrCreateBuilder(currentGroupings, granularities); } builder.addAggregateCall(translator.toAggregateCall(aggBuilder, rowType), translator.getAggregateFieldName(aggBuilder)); Suggestion importance[1-10]: 6 __ Why: If `currentFilterKey` is non-null but not yet present in `granularities`, `granularities.get(currentFilterKey)` returns `null`, causing a `NullPointerException` on `builder.addAggregateCall(...)`. Adding a null check with a meaningful exception improves robustness.	Low

Suggestions up to commit 26ffedd

Category	Suggestion	Impact
Possible issue	Avoid double iterator creation on Iterable `buckets.iterator()` is called twice on the same `Iterable`. For non-reusable iterables (e.g., a single-use stream), the second call to `.iterator()` may return an exhausted iterator, causing `bucket` to never be assigned. Store the iterator or convert to a list first. sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/FilterBucketTranslator.java [79-85] public InternalAggregation toBucketAggregation(FilterAggregationBuilder agg, Iterable<BucketEntry> buckets) { - if (!buckets.iterator().hasNext()) { + java.util.Iterator<BucketEntry> it = buckets.iterator(); + if (!it.hasNext()) { return new InternalFilter(agg.getName(), 0, InternalAggregations.EMPTY, Map.of()); } - BucketEntry bucket = buckets.iterator().next(); + BucketEntry bucket = it.next(); return new InternalFilter(agg.getName(), bucket.docCount(), bucket.subAggs(), Map.of()); } Suggestion importance[1-10]: 8 __ Why: Calling `buckets.iterator()` twice on a non-reusable `Iterable` (e.g., a stream-backed iterable) would exhaust the iterator on the first call, causing the second call to return an empty iterator and `bucket` to never be assigned. The fix correctly stores the iterator in a variable.	Medium
Possible issue	Guard against null builder lookup by filter key When `currentFilterKey` is non-null, `granularities.get(currentFilterKey)` may return `null` if the filter bucket builder was not yet created before the metric is processed. This would cause a `NullPointerException` on the `builder.addAggregateCall(...)` call. A null check or a `getOrCreate` call should be used instead. sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationTreeWalker.java [283-289] if (currentFilterKey != null) { builder = granularities.get(currentFilterKey); + if (builder == null) { + throw new ConversionException("No builder found for filter key: " + currentFilterKey); + } } else { builder = getOrCreateBuilder(currentGroupings, granularities); } builder.addAggregateCall(translator.toAggregateCall(aggBuilder, rowType), translator.getAggregateFieldName(aggBuilder)); Suggestion importance[1-10]: 7 __ Why: If `currentFilterKey` is non-null but no builder exists in `granularities` for that key, `granularities.get(currentFilterKey)` returns `null`, causing a `NullPointerException` on `builder.addAggregateCall(...)`. The suggestion correctly identifies this potential NPE and proposes a meaningful error message.	Medium
General	Restore defensive copy for mutable list field The `bucketOrders` field was previously copied defensively with `List.copyOf()`, but the new code assigns it directly. This inconsistency could allow external mutation of the internal state. Apply `List.copyOf()` here as well, or handle the null case if `bucketOrders` can be null. sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationMetadata.java [59] -this.bucketOrders = bucketOrders; +this.bucketOrders = bucketOrders != null ? List.copyOf(bucketOrders) : List.of(); Suggestion importance[1-10]: 6 __ Why: The old code used `List.copyOf(bucketOrders)` for defensive copying, but the new code assigns `bucketOrders` directly, breaking encapsulation. The suggestion correctly identifies this regression and proposes a null-safe fix.	Low
General	Add recursion depth limit to prevent stack overflow This recursive method has no depth limit and could cause a `StackOverflowError` for deeply nested query plans. Consider adding a maximum depth guard to prevent unbounded recursion. sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/executor/DslQueryPlanExecutor.java [78-84] +private static final int MAX_LOG_DEPTH = 20; + private void logRowTypes(RelNode node, int depth) { + if (depth > MAX_LOG_DEPTH) { + logger.info("{}[...truncated at depth {}]", " ".repeat(depth), depth); + return; + } String indent = " ".repeat(depth); logger.info("{}[{}] rowType: {}", indent, node.getRelTypeName(), node.getRowType()); for (RelNode input : node.getInputs()) { logRowTypes(input, depth + 1); } } Suggestion importance[1-10]: 4 __ Why: The `logRowTypes` method is only called within an `isInfoEnabled()` guard and is a debug/logging utility. While a depth limit is a reasonable defensive measure, the risk of a `StackOverflowError` in practice is low for typical query plans, making this a minor improvement.	Low

github-actions · 2026-04-17T06:05:05Z

❌ Gradle check result for 26ffedd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…-bucket-agg

github-actions · 2026-04-20T18:41:49Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 69d2f7f.

Path	Line	Severity	Description
server/src/main/java/org/opensearch/search/aggregations/bucket/filter/InternalFilter.java	48	low	The `InternalFilter` constructor visibility is widened from package-private to `public` in a core server class. While this appears justified to allow the sandbox plugin to construct `InternalFilter` instances from a different package, it permanently expands the public API surface of a core aggregation class, allowing any code (including future untrusted plugins) to construct `InternalFilter` objects with arbitrary doc counts and sub-aggregations.
sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/executor/DslQueryPlanExecutor.java	79	low	The new `logRowTypes` method logs full schema/row type information at INFO level (not DEBUG), meaning it will appear in production logs by default. This exposes index schema metadata (field names and types for every node in the query plan) in operational logs. This is inconsistent with the guarded `logger.isInfoEnabled()` check already present in the surrounding `logPlan` method, suggesting the log level choice may not be intentional.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-04-20T18:42:54Z

Persistent review updated to latest commit 69d2f7f

github-actions · 2026-04-20T19:41:01Z

❌ Gradle check result for 69d2f7f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-04-26T16:54:08Z

Persistent review updated to latest commit 24ed814

github-actions · 2026-04-26T16:58:26Z

❌ Gradle check result for 24ed814: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Add filter and filters bucket aggregation in dsl-query-executor plugin

26ffedd

Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>

tanik98 marked this pull request as ready for review April 17, 2026 06:36

tanik98 requested review from a team, Bukhtawar, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners April 17, 2026 06:36

Merge branch 'opensearch-project:main' into dsl-query-executor-filter…

69d2f7f

…-bucket-agg

opensearch-ci-bot mentioned this pull request Apr 23, 2026

[AUTOCUT] Gradle Check Flaky Test Report for RareClusterStateIT #19044

Open

Merge branch 'main' into dsl-query-executor-filter-bucket-agg

24ed814

opensearch-ci-bot mentioned this pull request Apr 27, 2026

[AUTOCUT] Gradle Check Flaky Test Report for RareClusterStateIT #21379

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add filter and filters bucket aggregation in dsl-query-executor plugin#21254

Add filter and filters bucket aggregation in dsl-query-executor plugin#21254
tanik98 wants to merge 3 commits into
opensearch-project:mainfrom
tanik98:dsl-query-executor-filter-bucket-agg

tanik98 commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tanik98 commented Apr 17, 2026

Description

Related Issues

Check List

Uh oh!

github-actions Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 24ed814)

Uh oh!

github-actions Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

PR Code Analyzer ❗

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 17, 2026 •

edited

Loading

(Review updated until commit `24ed814`)

github-actions Bot commented Apr 17, 2026 •

edited

Loading