Add Aggregation Execution Context by imotov · Pull Request #85011 · elastic/elasticsearch

imotov · 2022-03-15T21:33:38Z

Adds a place to store information during aggregation execution and use this
context to store the current tsid. It allows us to achieve 3x improvement in
the timeseries aggregation execution speed. In a follow up PR, I would like
to remove the inheritance of BucketCollector from Collector and instead try
wrapping it into a collector when needed. This should prevent us form using
getLeafCollector(LeafReaderContext ctx) method in a wrong context in future.

Relates to #74660

Adds a place to store information during aggregation execution and use this context to store the current tsid. It allows us to achieve 3x improvement in the timeseries aggregation execution speed. In a follow up PR, I would like to remove the inheritance of BucketCollector from Collector and instead try wrapping it into a collector when needed. This should prevent us form using getLeafCollector(LeafReaderContext ctx) method in a wrong context in future. Relates to elastic#74660

imotov · 2022-03-15T22:13:48Z

@elasticmachine run elasticsearch-ci/part-2

elasticmachine · 2022-03-16T18:50:20Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

nik9000

What should we do about delayed stuff here? Right now I think it'd just get the last tsid, right?

nik9000 · 2022-03-16T18:54:11Z

server/src/internalClusterTest/java/org/elasticsearch/action/search/TransportSearchIT.java


        @Override
-        public LeafBucketCollector getLeafCollector(LeafReaderContext ctx) throws IOException {
+        public LeafBucketCollector getLeafCollector(LeafReaderContext ctx, AggregationExecutionContext aggCtx) throws IOException {


Should we stick the LeafReaderContext onto it?

nik9000 · 2022-03-16T18:55:27Z

server/src/main/java/org/elasticsearch/search/aggregations/AggregatorBase.java

     */
    protected abstract LeafBucketCollector getLeafCollector(LeafReaderContext ctx, LeafBucketCollector sub) throws IOException;

+    // TODO: Remove the


s/Remove the/Remove me/?

imotov · 2022-03-16T19:05:48Z

What should we do about delayed stuff here? Right now I think it'd just get the last tsid, right?

Delayed stuff is not supported by time series, I think I added checked in the previous iterations. But it needs to be cleaned up a bit so it is more obvious. I will try to address this in the follow up refactoring I mentioned in the issue description.

nik9000 · 2022-03-16T19:30:50Z

Delayed stuff is not supported by time series, I think I added checked in the previous iterations. But it needs to be cleaned up a bit so it is more obvious. I will try to address this in the follow up refactoring I mentioned in the issue description.

I wonder if you had a forDeferred method or something that just throws if tsid is set. Or something like that.

nik9000 · 2022-03-16T19:30:57Z

I wonder if you had a forDeferred method or something that just throws if tsid is set. Or something like that.

But, yeah, it can wait.

weizijun · 2022-03-21T11:14:29Z

server/src/main/java/org/elasticsearch/search/aggregations/timeseries/TimeSeriesAggregator.java

-                    } else {
-                        collectBucket(sub, doc, bucketOrdinal);
-                    }
+                long bucketOrdinal = bucketOrds.add(bucket, aggCtx.getTsid());


Should it add an assert to check aggCtx is not null, or check aggCtx null?

We have a check much earlier that this aggregator runs in the correct environment to prevent NPE here. I don't really see the difference between failing it with NPE vs assert.

weizijun · 2022-03-21T11:15:41Z

server/src/main/java/org/elasticsearch/search/aggregations/AggregationExecutionContext.java

+    private CheckedSupplier<BytesRef, IOException> tsidProvider;
+
+    public BytesRef getTsid() throws IOException {
+        return tsidProvider.get();


Should it add a null check to tsidProvider to avoid NPE?

romseygeek

I think we can avoid the setter if you reorder things in search a little?

romseygeek · 2022-03-22T14:50:42Z

server/src/main/java/org/elasticsearch/search/aggregations/AggregationExecutionContext.java

+ */
+public class AggregationExecutionContext {
+
+    private CheckedSupplier<BytesRef, IOException> tsidProvider;


Can we make this final?

imotov · 2022-03-23T06:20:36Z

I think we can avoid the setter if you reorder things in search a little?

@romseygeek LeafBucketCollector needs the current id part of the LeafWalker and LeafWalker needs a reference to LeafBucketCollector. So, we have to ether make collector in LeafWalkerCollector non final, or move the entire initialization into LeafWalker constructor, but then we need to deal with null scorer by throwing an exception. What did you have in mind? What am I missing?

romseygeek

and LeafWalker needs a reference to LeafBucketCollector

That was the bit I had missed. I think this is good to go then, thanks!

imotov · 2022-03-28T21:23:26Z

@elasticmachine update branch

imotov · 2022-03-30T17:59:30Z

@elasticmachine update branch

imotov · 2022-03-31T00:24:53Z

@elasticmachine run elasticsearch-ci/rest-compatibility

imotov · 2022-03-31T00:39:12Z

@elasticmachine update branch

imotov · 2022-03-31T16:21:24Z

@elasticmachine update branch

jpountz · 2022-04-01T12:23:23Z

Nice, if my understanding is correct we are going from O(num_docs) lookups in the terms dict of the TSID field to O(num_unique_tsids * num_segments)? So this could be even more than 3x faster on large force-merged shards?

imotov · 2022-04-04T19:07:58Z

That is a good point. I didn't test it on a large force-merge shards, I will try that. However, I think most of the queries will be executed on hot shards, so realistically speaking this 3x improvement is what we will mostly likely get in real life scenarios.

csoulios · 2022-04-07T12:29:21Z

That will be a great performance improvement for rollups. I will measure its impact when doing a force-merge shards before the shard rollup.

imotov · 2022-04-12T02:30:41Z

So, I did the test on the rally's tsdb track. The timing has reduced, but not significantly. If with 32 segments I was getting 8469 ms, with 1 segments I started to get 7393 ms. What is unrelated but interesting, while doing this test I also ran cardinality agg by mistake, and I noticed that cardinality execution time grew from 8469 ms on unmerged segments to 23123 ms on a single segment. I repeated the test twice because it was so bizarre and it is definitely a reproducible result.

nik9000 · 2022-10-11T08:54:56Z

@not_napolion can probably talk to the cardinality change. He's been looking at it lately.

…

On Mon, Apr 11, 2022, 10:30 PM Igor Motov ***@***.***> wrote: So, I did the test on the rally's tsdb track. The timing has reduced, but not significantly. If with 32 segments I was getting 8469 ms, with 1 segments I started to get 7393 ms. What is unrelated but interesting, while doing this test I also ran cardinality agg by mistake, and I noticed that cardinality execution time grew from 8469 ms on unmerged segments to 23123 ms on a single segment. I repeated the test twice because it was so bizarre and it is definitely a reproducible result. — Reply to this email directly, view it on GitHub <#85011 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUXIVSMWGX65CZA3LE2F3VETN5ZANCNFSM5Q2AI35A> . You are receiving this because your review was requested.Message ID: ***@***.***>

imotov added >non-issue :StorageEngine/TSDB You know, for Metrics v8.2.0 labels Mar 15, 2022

imotov requested review from nik9000 and romseygeek March 15, 2022 21:33

imotov mentioned this pull request Mar 15, 2022

Add better support for metric data types (TSDB) #74660

Closed

imotov marked this pull request as ready for review March 16, 2022 18:35

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 16, 2022

nik9000 reviewed Mar 16, 2022

View reviewed changes

weizijun reviewed Mar 21, 2022

View reviewed changes

romseygeek reviewed Mar 22, 2022

View reviewed changes

romseygeek approved these changes Mar 28, 2022

View reviewed changes

elasticmachine and others added 3 commits March 29, 2022 07:53

Merge branch 'master' into optimize-timeseries-aggs-limited

7ac6066

Address review comments

1c70847

Fix Comment

a414227

imotov requested a review from nik9000 March 29, 2022 00:12

It was a trap and I walked right into it with my optimization :)

5b44a4d

nik9000 approved these changes Mar 30, 2022

View reviewed changes

salvatore-campagna added v8.3.0 and removed v8.2.0 labels Mar 30, 2022

Merge branch 'master' into optimize-timeseries-aggs-limited

f6ba938

Merge branch 'master' into optimize-timeseries-aggs-limited

43149c9

Merge branch 'master' into optimize-timeseries-aggs-limited

b9e13f6

imotov merged commit 736ce7e into elastic:master Mar 31, 2022

Conversation

imotov commented Mar 15, 2022

Uh oh!

imotov commented Mar 15, 2022

Uh oh!

elasticmachine commented Mar 16, 2022

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

nik9000 Mar 16, 2022

Choose a reason for hiding this comment

Uh oh!

nik9000 Mar 16, 2022

Choose a reason for hiding this comment

Uh oh!

imotov commented Mar 16, 2022

Uh oh!

nik9000 commented Mar 16, 2022

Uh oh!

nik9000 commented Mar 16, 2022

Uh oh!

weizijun Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

imotov Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

weizijun Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

imotov Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

romseygeek Mar 22, 2022

Choose a reason for hiding this comment

Uh oh!

imotov commented Mar 23, 2022

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

imotov commented Mar 28, 2022

Uh oh!

imotov commented Mar 30, 2022

Uh oh!

imotov commented Mar 31, 2022

Uh oh!

imotov commented Mar 31, 2022

Uh oh!

imotov commented Mar 31, 2022

Uh oh!

jpountz commented Apr 1, 2022

Uh oh!

imotov commented Apr 4, 2022

Uh oh!

csoulios commented Apr 7, 2022

Uh oh!

imotov commented Apr 12, 2022

Uh oh!

nik9000 commented Oct 11, 2022 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants