Skip to content

Add filter and filters bucket aggregation in dsl-query-executor plugin#21254

Open
tanik98 wants to merge 3 commits into
opensearch-project:mainfrom
tanik98:dsl-query-executor-filter-bucket-agg
Open

Add filter and filters bucket aggregation in dsl-query-executor plugin#21254
tanik98 wants to merge 3 commits into
opensearch-project:mainfrom
tanik98:dsl-query-executor-filter-bucket-agg

Conversation

@tanik98
Copy link
Copy Markdown
Contributor

@tanik98 tanik98 commented Apr 17, 2026

Description

Add DSL-to-Calcite conversion for filter and filters bucket aggregation.

  • Filter: It filters documents based on embedded condition and creates a single bucket, gets converted to LogicalFilter
  • Filters: It creates N(N+1 if other_bucket is enabled) buckets based on N filter conditions, gets converted to separate N plans with LogicalFilter

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 17, 2026

PR Reviewer Guide 🔍

(Review updated until commit 24ed814)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Add filter bucket aggregation support

Relevant files:

  • sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/FilterBucketTranslator.java
  • sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/aggregation/bucket/FilterBucketTranslatorTests.java
  • sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/EmptyGrouping.java
  • server/src/main/java/org/opensearch/search/aggregations/bucket/filter/InternalFilter.java

Sub-PR theme: Add filters bucket aggregation support

Relevant files:

  • sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/FiltersBucketTranslator.java
  • sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/aggregation/bucket/FiltersBucketTranslatorTests.java

Sub-PR theme: Integrate filter/filters into tree walker and converter pipeline

Relevant files:

  • sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationTreeWalker.java
  • sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationMetadata.java
  • sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationMetadataBuilder.java
  • sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationRegistryFactory.java
  • sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/BucketTranslator.java
  • sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/converter/AggregateConverter.java
  • sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/converter/SearchSourceConverter.java
  • sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/aggregation/AggregationTreeWalkerTests.java
  • sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/converter/AggregateConverterTests.java

⚡ Recommended focus areas for review

Iterator Reuse Bug

In toBucketAggregation, the buckets iterable's iterator is called twice: once to check hasNext() and again to call next(). If the Iterable returns a fresh iterator each time (e.g., a List), this works, but if it returns a single-use iterator (e.g., a stream-backed iterable), the second call to iterator().next() will fail or return incorrect results. Consider converting to a list or using a single iterator variable.

public InternalAggregation toBucketAggregation(FilterAggregationBuilder agg, Iterable<BucketEntry> buckets) {
    if (!buckets.iterator().hasNext()) {
        return new InternalFilter(agg.getName(), 0, InternalAggregations.EMPTY, Map.of());
    }
    BucketEntry bucket = buckets.iterator().next();
    return new InternalFilter(agg.getName(), bucket.docCount(), bucket.subAggs(), Map.of());
}
Null Safety

The bucketOrders field is now assigned directly from the constructor parameter without defensive copying (changed from List.copyOf(bucketOrders) to this.bucketOrders = bucketOrders). If a mutable list is passed in, it could be modified externally, breaking immutability guarantees. The filterCondition being nullable is fine, but bucketOrders should still be defensively copied.

this.bucketOrders = bucketOrders;
this.filterCondition = filterCondition;
Null Dereference Risk

In handleMetric, when currentFilterKey is non-null, the builder is fetched via granularities.get(currentFilterKey). If the filter bucket builder was never created (e.g., due to a logic error or unexpected ordering), builder will be null, causing a NullPointerException on the subsequent addAggregateCall call. There is no null-check or error handling for this case.

AggregationMetadataBuilder builder;
if (currentFilterKey != null) {
    builder = granularities.get(currentFilterKey);
} else {
    builder = getOrCreateBuilder(currentGroupings, granularities);
}
builder.addAggregateCall(translator.toAggregateCall(aggBuilder, rowType), translator.getAggregateFieldName(aggBuilder));
Visibility Change

The constructor of InternalFilter has been changed from package-private to public. This widens the API surface of a core server class and may break encapsulation. This change should be carefully reviewed to ensure it doesn't expose internal implementation details or allow unintended instantiation from outside the package.

public InternalFilter(String name, long docCount, InternalAggregations subAggregations, Map<String, Object> metadata) {
    super(name, docCount, subAggregations, metadata);
}
Debug Code

The logRowTypes method recursively logs the row type of every RelNode in the plan tree at INFO level. This appears to be debug/development code that could produce excessive log output in production and should either be removed or guarded behind a DEBUG log level check.

private void logRowTypes(RelNode node, int depth) {
    String indent = "  ".repeat(depth);
    logger.info("{}[{}] rowType: {}", indent, node.getRelTypeName(), node.getRowType());
    for (RelNode input : node.getInputs()) {
        logRowTypes(input, depth + 1);
    }
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 17, 2026

PR Code Suggestions ✨

Latest suggestions up to 24ed814

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Restore defensive copy for mutable list field

The bucketOrders field was previously defensively copied with List.copyOf(), but the
new code assigns it directly. This inconsistency could allow external mutation of
the internal state. It should remain a defensive copy like the other list fields.

sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationMetadata.java [59]

-this.bucketOrders = bucketOrders;
+this.bucketOrders = List.copyOf(bucketOrders);
Suggestion importance[1-10]: 7

__

Why: The bucketOrders field was previously assigned with List.copyOf() for immutability, but the new code assigns it directly, creating an inconsistency with other list fields and potentially allowing external mutation. This is a valid correctness concern.

Medium
Avoid double iterator creation on Iterable

buckets.iterator() is called twice on the same Iterable. For non-reusable iterables
(e.g., a single-use stream or iterator), the second call to .iterator().hasNext()
and then .iterator().next() may not work correctly or may skip elements. The
iterator should be obtained once and reused.

sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/FilterBucketTranslator.java [79-85]

 public InternalAggregation toBucketAggregation(FilterAggregationBuilder agg, Iterable<BucketEntry> buckets) {
-    if (!buckets.iterator().hasNext()) {
+    java.util.Iterator<BucketEntry> it = buckets.iterator();
+    if (!it.hasNext()) {
         return new InternalFilter(agg.getName(), 0, InternalAggregations.EMPTY, Map.of());
     }
-    BucketEntry bucket = buckets.iterator().next();
+    BucketEntry bucket = it.next();
     return new InternalFilter(agg.getName(), bucket.docCount(), bucket.subAggs(), Map.of());
 }
Suggestion importance[1-10]: 7

__

Why: Calling buckets.iterator() twice is a bug for non-reusable iterables — the second call creates a fresh iterator, so iterator().next() would return the first element again but only after the first iterator was already consumed. Storing the iterator in a variable and reusing it is the correct approach.

Medium
Guard against null builder lookup by filter key

When currentFilterKey is non-null, granularities.get(currentFilterKey) may return
null if the filter bucket builder was not yet created before the metric is
processed. This would cause a NullPointerException on the
builder.addAggregateCall(...) call. A null check or a getOrCreateBuilder call should
be used instead.

sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationTreeWalker.java [283-289]

 if (currentFilterKey != null) {
     builder = granularities.get(currentFilterKey);
+    if (builder == null) {
+        throw new ConversionException("No builder found for filter key: " + currentFilterKey);
+    }
 } else {
     builder = getOrCreateBuilder(currentGroupings, granularities);
 }
 builder.addAggregateCall(translator.toAggregateCall(aggBuilder, rowType), translator.getAggregateFieldName(aggBuilder));
Suggestion importance[1-10]: 6

__

Why: When currentFilterKey is non-null, granularities.get(currentFilterKey) could return null if the builder wasn't created before the metric is processed, leading to a NullPointerException. Adding a null check with a meaningful error would improve robustness.

Low
General
Prevent unbounded recursion in logging utility

This recursive method has no depth limit and could cause a StackOverflowError for
deeply nested query plans. A maximum depth guard should be added to prevent
unbounded recursion in production environments.

sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/executor/DslQueryPlanExecutor.java [96-102]

+private static final int MAX_LOG_DEPTH = 20;
+
 private void logRowTypes(RelNode node, int depth) {
+    if (depth > MAX_LOG_DEPTH) {
+        logger.info("{}[...truncated at depth {}]", "  ".repeat(depth), depth);
+        return;
+    }
     String indent = "  ".repeat(depth);
     logger.info("{}[{}] rowType: {}", indent, node.getRelTypeName(), node.getRowType());
     for (RelNode input : node.getInputs()) {
         logRowTypes(input, depth + 1);
     }
 }
Suggestion importance[1-10]: 3

__

Why: The recursive logRowTypes method has no depth limit, but in practice Calcite plan trees are unlikely to be deeply nested enough to cause a StackOverflowError. This is a minor defensive improvement for a debug logging utility.

Low

Previous suggestions

Suggestions up to commit 69d2f7f
CategorySuggestion                                                                                                                                    Impact
General
Restore defensive copy for mutable list field

The bucketOrders field was previously defensively copied with List.copyOf(), but the
new code assigns it directly. This inconsistency could allow external mutation of
the internal state. It should remain a defensive copy for consistency and safety.

sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationMetadata.java [59]

-this.bucketOrders = bucketOrders;
+this.bucketOrders = List.copyOf(bucketOrders);
Suggestion importance[1-10]: 7

__

Why: The bucketOrders field was previously assigned with List.copyOf() for immutability, but the PR changed it to a direct assignment. This inconsistency could allow external mutation of the internal state, unlike all other list fields which still use List.copyOf().

Medium
Possible issue
Avoid double iterator creation on same iterable

buckets.iterator() is called twice on the same Iterable. For non-reusable iterables
(e.g., a single-use stream), the second call to buckets.iterator().next() may fail
or return incorrect results. The iterator should be obtained once and reused.

sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/FilterBucketTranslator.java [79-85]

 public InternalAggregation toBucketAggregation(FilterAggregationBuilder agg, Iterable<BucketEntry> buckets) {
-    if (!buckets.iterator().hasNext()) {
+    java.util.Iterator<BucketEntry> it = buckets.iterator();
+    if (!it.hasNext()) {
         return new InternalFilter(agg.getName(), 0, InternalAggregations.EMPTY, Map.of());
     }
-    BucketEntry bucket = buckets.iterator().next();
+    BucketEntry bucket = it.next();
     return new InternalFilter(agg.getName(), bucket.docCount(), bucket.subAggs(), Map.of());
 }
Suggestion importance[1-10]: 7

__

Why: Calling buckets.iterator() twice on the same Iterable is unsafe for non-reusable iterables (e.g., streams). Storing the iterator once and reusing it prevents potential bugs with single-use iterables.

Medium
Guard against null builder lookup by filter key

When currentFilterKey is non-null, granularities.get(currentFilterKey) may return
null if the key doesn't exist yet (e.g., if a metric appears before its parent
filter bucket is processed). This would cause a NullPointerException on the
subsequent builder.addAggregateCall(...) call.

sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationTreeWalker.java [284-289]

 if (currentFilterKey != null) {
     builder = granularities.get(currentFilterKey);
+    if (builder == null) {
+        throw new ConversionException("No metadata builder found for filter key: " + currentFilterKey);
+    }
 } else {
     builder = getOrCreateBuilder(currentGroupings, granularities);
 }
 builder.addAggregateCall(translator.toAggregateCall(aggBuilder, rowType), translator.getAggregateFieldName(aggBuilder));
Suggestion importance[1-10]: 6

__

Why: If currentFilterKey is non-null but not yet present in granularities, granularities.get(currentFilterKey) returns null, causing a NullPointerException on builder.addAggregateCall(...). Adding a null check with a meaningful exception improves robustness.

Low
Suggestions up to commit 26ffedd
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid double iterator creation on Iterable

buckets.iterator() is called twice on the same Iterable. For non-reusable iterables
(e.g., a single-use stream), the second call to .iterator() may return an exhausted
iterator, causing bucket to never be assigned. Store the iterator or convert to a
list first.

sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/bucket/FilterBucketTranslator.java [79-85]

 public InternalAggregation toBucketAggregation(FilterAggregationBuilder agg, Iterable<BucketEntry> buckets) {
-    if (!buckets.iterator().hasNext()) {
+    java.util.Iterator<BucketEntry> it = buckets.iterator();
+    if (!it.hasNext()) {
         return new InternalFilter(agg.getName(), 0, InternalAggregations.EMPTY, Map.of());
     }
-    BucketEntry bucket = buckets.iterator().next();
+    BucketEntry bucket = it.next();
     return new InternalFilter(agg.getName(), bucket.docCount(), bucket.subAggs(), Map.of());
 }
Suggestion importance[1-10]: 8

__

Why: Calling buckets.iterator() twice on a non-reusable Iterable (e.g., a stream-backed iterable) would exhaust the iterator on the first call, causing the second call to return an empty iterator and bucket to never be assigned. The fix correctly stores the iterator in a variable.

Medium
Guard against null builder lookup by filter key

When currentFilterKey is non-null, granularities.get(currentFilterKey) may return
null if the filter bucket builder was not yet created before the metric is
processed. This would cause a NullPointerException on the
builder.addAggregateCall(...) call. A null check or a getOrCreate call should be
used instead.

sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationTreeWalker.java [283-289]

 if (currentFilterKey != null) {
     builder = granularities.get(currentFilterKey);
+    if (builder == null) {
+        throw new ConversionException("No builder found for filter key: " + currentFilterKey);
+    }
 } else {
     builder = getOrCreateBuilder(currentGroupings, granularities);
 }
 builder.addAggregateCall(translator.toAggregateCall(aggBuilder, rowType), translator.getAggregateFieldName(aggBuilder));
Suggestion importance[1-10]: 7

__

Why: If currentFilterKey is non-null but no builder exists in granularities for that key, granularities.get(currentFilterKey) returns null, causing a NullPointerException on builder.addAggregateCall(...). The suggestion correctly identifies this potential NPE and proposes a meaningful error message.

Medium
General
Restore defensive copy for mutable list field

The bucketOrders field was previously copied defensively with List.copyOf(), but the
new code assigns it directly. This inconsistency could allow external mutation of
the internal state. Apply List.copyOf() here as well, or handle the null case if
bucketOrders can be null.

sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/aggregation/AggregationMetadata.java [59]

-this.bucketOrders = bucketOrders;
+this.bucketOrders = bucketOrders != null ? List.copyOf(bucketOrders) : List.of();
Suggestion importance[1-10]: 6

__

Why: The old code used List.copyOf(bucketOrders) for defensive copying, but the new code assigns bucketOrders directly, breaking encapsulation. The suggestion correctly identifies this regression and proposes a null-safe fix.

Low
Add recursion depth limit to prevent stack overflow

This recursive method has no depth limit and could cause a StackOverflowError for
deeply nested query plans. Consider adding a maximum depth guard to prevent
unbounded recursion.

sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/executor/DslQueryPlanExecutor.java [78-84]

+private static final int MAX_LOG_DEPTH = 20;
+
 private void logRowTypes(RelNode node, int depth) {
+    if (depth > MAX_LOG_DEPTH) {
+        logger.info("{}[...truncated at depth {}]", "  ".repeat(depth), depth);
+        return;
+    }
     String indent = "  ".repeat(depth);
     logger.info("{}[{}] rowType: {}", indent, node.getRelTypeName(), node.getRowType());
     for (RelNode input : node.getInputs()) {
         logRowTypes(input, depth + 1);
     }
 }
Suggestion importance[1-10]: 4

__

Why: The logRowTypes method is only called within an isInfoEnabled() guard and is a debug/logging utility. While a depth limit is a reasonable defensive measure, the risk of a StackOverflowError in practice is low for typical query plans, making this a minor improvement.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 26ffedd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 69d2f7f.

PathLineSeverityDescription
server/src/main/java/org/opensearch/search/aggregations/bucket/filter/InternalFilter.java48lowThe `InternalFilter` constructor visibility is widened from package-private to `public` in a core server class. While this appears justified to allow the sandbox plugin to construct `InternalFilter` instances from a different package, it permanently expands the public API surface of a core aggregation class, allowing any code (including future untrusted plugins) to construct `InternalFilter` objects with arbitrary doc counts and sub-aggregations.
sandbox/plugins/dsl-query-executor/src/main/java/org/opensearch/dsl/executor/DslQueryPlanExecutor.java79lowThe new `logRowTypes` method logs full schema/row type information at INFO level (not DEBUG), meaning it will appear in production logs by default. This exposes index schema metadata (field names and types for every node in the query plan) in operational logs. This is inconsistent with the guarded `logger.isInfoEnabled()` check already present in the surrounding `logPlan` method, suggesting the log level choice may not be intentional.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 69d2f7f

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 69d2f7f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 24ed814

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 24ed814: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant