Skip to content

Add Aggregation Execution Context#85011

Merged
imotov merged 8 commits intoelastic:masterfrom
imotov:optimize-timeseries-aggs-limited
Mar 31, 2022
Merged

Add Aggregation Execution Context#85011
imotov merged 8 commits intoelastic:masterfrom
imotov:optimize-timeseries-aggs-limited

Conversation

@imotov
Copy link
Copy Markdown
Contributor

@imotov imotov commented Mar 15, 2022

Adds a place to store information during aggregation execution and use this
context to store the current tsid. It allows us to achieve 3x improvement in
the timeseries aggregation execution speed. In a follow up PR, I would like
to remove the inheritance of BucketCollector from Collector and instead try
wrapping it into a collector when needed. This should prevent us form using
getLeafCollector(LeafReaderContext ctx) method in a wrong context in future.

Relates to #74660

Adds a place to store information during aggregation execution and use this
context to store the current tsid. It allows us to achieve 3x improvement in
the timeseries aggregation execution speed. In a follow up PR, I would like
to remove the inheritance of BucketCollector from Collector and instead try
wrapping it into a collector when needed. This should prevent us form using
getLeafCollector(LeafReaderContext ctx) method in a wrong context in future.

Relates to elastic#74660
@imotov
Copy link
Copy Markdown
Contributor Author

imotov commented Mar 15, 2022

@elasticmachine run elasticsearch-ci/part-2

@imotov imotov marked this pull request as ready for review March 16, 2022 18:35
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 16, 2022
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

Copy link
Copy Markdown
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do about delayed stuff here? Right now I think it'd just get the last tsid, right?


@Override
public LeafBucketCollector getLeafCollector(LeafReaderContext ctx) throws IOException {
public LeafBucketCollector getLeafCollector(LeafReaderContext ctx, AggregationExecutionContext aggCtx) throws IOException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we stick the LeafReaderContext onto it?

*/
protected abstract LeafBucketCollector getLeafCollector(LeafReaderContext ctx, LeafBucketCollector sub) throws IOException;

// TODO: Remove the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Remove the/Remove me/?

@imotov
Copy link
Copy Markdown
Contributor Author

imotov commented Mar 16, 2022

What should we do about delayed stuff here? Right now I think it'd just get the last tsid, right?

Delayed stuff is not supported by time series, I think I added checked in the previous iterations. But it needs to be cleaned up a bit so it is more obvious. I will try to address this in the follow up refactoring I mentioned in the issue description.

@nik9000
Copy link
Copy Markdown
Member

nik9000 commented Mar 16, 2022

Delayed stuff is not supported by time series, I think I added checked in the previous iterations. But it needs to be cleaned up a bit so it is more obvious. I will try to address this in the follow up refactoring I mentioned in the issue description.

I wonder if you had a forDeferred method or something that just throws if tsid is set. Or something like that.

@nik9000
Copy link
Copy Markdown
Member

nik9000 commented Mar 16, 2022

I wonder if you had a forDeferred method or something that just throws if tsid is set. Or something like that.

But, yeah, it can wait.

} else {
collectBucket(sub, doc, bucketOrdinal);
}
long bucketOrdinal = bucketOrds.add(bucket, aggCtx.getTsid());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it add an assert to check aggCtx is not null, or check aggCtx null?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a check much earlier that this aggregator runs in the correct environment to prevent NPE here. I don't really see the difference between failing it with NPE vs assert.

private CheckedSupplier<BytesRef, IOException> tsidProvider;

public BytesRef getTsid() throws IOException {
return tsidProvider.get();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it add a null check to tsidProvider to avoid NPE?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Copy Markdown
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid the setter if you reorder things in search a little?

*/
public class AggregationExecutionContext {

private CheckedSupplier<BytesRef, IOException> tsidProvider;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this final?

@imotov
Copy link
Copy Markdown
Contributor Author

imotov commented Mar 23, 2022

I think we can avoid the setter if you reorder things in search a little?

@romseygeek LeafBucketCollector needs the current id part of the LeafWalker and LeafWalker needs a reference to LeafBucketCollector. So, we have to ether make collector in LeafWalkerCollector non final, or move the entire initialization into LeafWalker constructor, but then we need to deal with null scorer by throwing an exception. What did you have in mind? What am I missing?

Copy link
Copy Markdown
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and LeafWalker needs a reference to LeafBucketCollector

That was the bit I had missed. I think this is good to go then, thanks!

@imotov
Copy link
Copy Markdown
Contributor Author

imotov commented Mar 28, 2022

@elasticmachine update branch

@imotov imotov requested a review from nik9000 March 29, 2022 00:12
@imotov
Copy link
Copy Markdown
Contributor Author

imotov commented Mar 30, 2022

@elasticmachine update branch

@imotov
Copy link
Copy Markdown
Contributor Author

imotov commented Mar 31, 2022

@elasticmachine run elasticsearch-ci/rest-compatibility

@imotov
Copy link
Copy Markdown
Contributor Author

imotov commented Mar 31, 2022

@elasticmachine update branch

@imotov
Copy link
Copy Markdown
Contributor Author

imotov commented Mar 31, 2022

@elasticmachine update branch

@imotov imotov merged commit 736ce7e into elastic:master Mar 31, 2022
@jpountz
Copy link
Copy Markdown
Contributor

jpountz commented Apr 1, 2022

Nice, if my understanding is correct we are going from O(num_docs) lookups in the terms dict of the TSID field to O(num_unique_tsids * num_segments)? So this could be even more than 3x faster on large force-merged shards?

@imotov
Copy link
Copy Markdown
Contributor Author

imotov commented Apr 4, 2022

That is a good point. I didn't test it on a large force-merge shards, I will try that. However, I think most of the queries will be executed on hot shards, so realistically speaking this 3x improvement is what we will mostly likely get in real life scenarios.

@csoulios
Copy link
Copy Markdown
Contributor

csoulios commented Apr 7, 2022

That will be a great performance improvement for rollups. I will measure its impact when doing a force-merge shards before the shard rollup.

@imotov
Copy link
Copy Markdown
Contributor Author

imotov commented Apr 12, 2022

So, I did the test on the rally's tsdb track. The timing has reduced, but not significantly. If with 32 segments I was getting 8469 ms, with 1 segments I started to get 7393 ms. What is unrelated but interesting, while doing this test I also ran cardinality agg by mistake, and I noticed that cardinality execution time grew from 8469 ms on unmerged segments to 23123 ms on a single segment. I repeated the test twice because it was so bizarre and it is definitely a reproducible result.

@nik9000
Copy link
Copy Markdown
Member

nik9000 commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants