Clean up how pipeline aggs check for multi-bucket#54161
Clean up how pipeline aggs check for multi-bucket#54161nik9000 merged 6 commits intoelastic:masterfrom
Conversation
Pipeline aggregations like `stats_bucket`, `sum_bucket`, and `percentiles_bucket` only operate on buckets that have multiple buckets. This adds support for those aggregations to `geo_distance`, `ip_range`, `auto_date_histogram`, and `rare_terms`. This all happened because we used a marker interface to mark compatible aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget to implement the interface. This replaces the marker interface with an abstract method in `AggregationBuilder`, `bucketCardinality` which makes you return `NONE`, `ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At this point `ONE` and `NONE` amount to about the same thing, but I suspect that'll be a useful distinction when validating bucket sorts. Closes elastic#53215
|
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
|
@elasticmachine, run elasticsearch-ci/2 |
|
The bwc failures are legit. The fixes for these aggs need skips. The elasticsearch-ci/2 didn't look legit at first glance. |
polyfractal
left a comment
There was a problem hiding this comment.
Rad, I like this. As a followup I wonder if we could give a similar treatment for "sequential" vs "non-sequential" aggs (e.g. date_histo has sequentially ordered buckets). Would allow us to cleanup that spot in pipeline validation where certain pipelines need sequential buckets.
(I think the merge conflicts are just little things, like removed generics, due to the VS merge. Should be easy enough to fix I hope)
| * This might be a bit of a lie though because you can't really use | ||
| * pipeline aggregations on composite aggregations. | ||
| */ | ||
| return BucketCardinality.MANY; |
There was a problem hiding this comment.
Yeah... I wonder if we should set this to NONE for the time being, to prevent people from accidentally using pipelines on composite? There's an issue flying around somewhere discussing it... but last time we chatted, we decided to punt on the decision. Some pipelines are theoretically safe (bucket_script, operating on the bucket itself), but others like derivative are totally not kosher.
There was a problem hiding this comment.
Yeah. Composite is a weird beast.
I think so! |
…c#54161) Pipeline aggregations like `stats_bucket`, `sum_bucket`, and `percentiles_bucket` only operate on buckets that have multiple buckets. This adds support for those aggregations to `geo_distance`, `ip_range`, `auto_date_histogram`, and `rare_terms`. This all happened because we used a marker interface to mark compatible aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget to implement the interface. This replaces the marker interface with an abstract method in `AggregationBuilder`, `bucketCardinality` which makes you return `NONE`, `ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At this point `ONE` and `NONE` amount to about the same thing, but I suspect that'll be a useful distinction when validating bucket sorts. Closes elastic#53215
* upstream/master: (447 commits) Adjust BWC version on node roles being sorted Deprecate node local storage setting (elastic#54374) Remove Unused BaseNodeRequest (elastic#54323) Decouple environment from DiscoveryNode (elastic#54373) Ensure that the output of node roles are sorted (elastic#54376) Do not stash environment in security (elastic#54372) Do not stash environment in machine learning (elastic#54371) Clean up how pipeline aggs check for multi-bucket (elastic#54161) Remove toString from Painless user tree nodes (elastic#54117) Docs: Use splitOnToken instead of custom function (elastic#48408) bwc: enable script cache in node stats (elastic#54362) bwc: Mute for script cache in node stats (elastic#54359) Test to enforce response to invalid data stream names Scripting: Use randomValueOtherThan in stats test (elastic#54356) Move transport decoding and aggregation to server (elastic#48263) Mute ScriptServiceTests.testContextCacheStats (elastic#54352) Improve checkstyle performance and readability (elastic#54308) Disable Gradle daemon when executing Windows packaging tests (elastic#54316) [Docs] Minor fix for SubmitAsyncSearchRequest.keepOnCompletion javadoc (elastic#54325) [DOCS] Fix typos in top metrics agg docs (elastic#54299) ... Conflicts: server/src/main/java/org/elasticsearch/index/IndexModule.java server/src/main/java/org/elasticsearch/index/IndexService.java server/src/main/java/org/elasticsearch/indices/IndicesService.java server/src/test/java/org/elasticsearch/index/IndexModuleTests.java
`master` wasn't sending `auto_date_histogram`'s `minimumIntervalExpression` over the wire to `7.x` nodes even though everything above `7.3` expected it. We never noticed this because we didn't have any yml tests for `auto_date_histogram` until I added some in elastic#54161. Closes elastic#54396
#54379) Pipeline aggregations like `stats_bucket`, `sum_bucket`, and `percentiles_bucket` only operate on buckets that have multiple buckets. This adds support for those aggregations to `geo_distance`, `ip_range`, `auto_date_histogram`, and `rare_terms`. This all happened because we used a marker interface to mark compatible aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget to implement the interface. This replaces the marker interface with an abstract method in `AggregationBuilder`, `bucketCardinality` which makes you return `NONE`, `ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At this point `ONE` and `NONE` amount to about the same thing, but I suspect that'll be a useful distinction when validating bucket sorts. Closes #53215
|
I've finished the backport but I haven't updated the branches on master to run the bwc tests properly. Before I do that I'm going to handle #54382 which I discovered while backporting. |
Now that elastic#54161 is backported we can run its tests against nodes that have the fix.
Now that #54161 is backported we can run its tests against nodes that have the fix. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Pipeline aggregations like
stats_bucket,sum_bucket, andpercentiles_bucketonly operate on buckets that have multiple buckets.This adds support for those aggregations to
geo_distance,ip_range,auto_date_histogram, andrare_terms.This all happened because we used a marker interface to mark compatible
aggs,
MultiBucketAggregationBuilderand it was fairly easy to forgetto implement the interface.
This replaces the marker interface with an abstract method in
AggregationBuilder,bucketCardinalitywhich makes you returnNONE,ONE, orMANY. Thebucketaggregations can check forMANY. Atthis point
ONEandNONEamount to about the same thing, but Isuspect that'll be a useful distinction when validating bucket sorts.
Closes #53215