-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Use Collector.setWeight to improve aggregation performance (for special cases) #10954
Description
Lucene added a new setWeight method to the Collector interface a while back (see https://issues.apache.org/jira/browse/LUCENE-10620), specifically to give collectors access to the Weight.count() method.
Weight.count() only has a few cases where it returns things other than -1 (the value meaning "I can't give you a cheap count"), but the cases where it does return are pretty useful -- mostly "match all" or "match none", but for a single term query will return "I match exactly this many", if there are no deletions in the current segment (since it just reads the term's doc freq).
I believe this can be useful to short-circuit some aggregation logic, since aggregations all extend Collector.
These are the special cases that I've been able to think of where the weight.count(leafReaderContext) could hint at smarter computation of aggregations:
- If the top-level query matches nothing int the current segment (i.e.
weight.count(leafReaderContext) == 0), then count for every bucket is 0. (If the min count is greater than 0, then you don't need to compute any buckets for this segment.) - If the top-level query matches everything in the current segment (i.e.
weight.count(leafReaderContext) == leafReaderContext.reader().maxDoc()), then the count of hits in a bucket (from the current segment) is determined entirely by the count of the bucket, which may be cheap to compute (e.g. doc freq for a terms aggregation, maybe read count from the BKD tree for a range aggregation). - If the top-level query has some other positive count, but a bucket matches everything in the current segment (e.g. the documents in the current segment are all from the same day and we're computing a daily date histogram), then the bucket count is
weight.count(leafReaderContext).
I didn't give it a lot of thought, so there might be some more that I'm missing.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status