ESQL: Pushdown count(field) to Lucene#100122
Conversation
|
Pinging @elastic/es-ql (Team:QL) |
|
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
...ql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
Outdated
Show resolved
Hide resolved
...ql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
int is a better default that prevents subtle conversion errors when trying to convert the null Integer to int.
There was a problem hiding this comment.
can you please explain a little more about the situation causing errors? is there any doc regarding the same?
|
I think this optimization doesn't account for cases where a field has multiple values in a document. |
Use the LuceneCountOperator also for ungrouped count(field) queries Fix elastic#99840
Prevent optimization across multiple fields
astefan
left a comment
There was a problem hiding this comment.
LGTM
Unrelated to this specific PR.... I want to point out a behavior that for me seems slightly confusing. If I use from employees | stats c = count() by gender I get
c | gender
---------------+---------------
10 |null
33 |F
57 |M
but if I use from employees | stats c = count(gender) by gender I get
c | gender
---------------+---------------
0 |null
57 |M
33 |F
I get it that count(Field) ignores null values, but it is confusing to even show the null group for count(Field). If count(Field) ignores nulls then let's ignore them all the way because the result above tells me (as an user that doesn't extrapolate "ignoring nulls" aspect) that there are 0 null values for gender. But I have other queries to reject this statement. Imo, showing the null group is wrong.
You are correct @dnhatn.
|
|
|
||
| // for the moment support pushing count just for one field | ||
| List<Stat> stats = tuple.v2(); | ||
| if (stats.size() > 1) { |
There was a problem hiding this comment.
is this check necessary (given the following one)? Or is it meant as an optimisation, to avoid reducing the stats if not necessary?
There was a problem hiding this comment.
Yes since the rule might pick up two different fields however at runtime we don't know how to combine the operators.
For example count(salary), count(emp_no) by gender - this results into one EsStatsQuery with two Stat however because these are two different queries, two different sources are required and we currently allow only one.
| from test | eval s = salary | rename s as sr | eval hidden_s = sr | rename emp_no as e | where e < 10050 | ||
| | stats c = count(hidden_s) |
There was a problem hiding this comment.
Nit:
| from test | eval s = salary | rename s as sr | eval hidden_s = sr | rename emp_no as e | where e < 10050 | |
| | stats c = count(hidden_s) | |
| from test | |
| | eval s = salary | rename s as sr | |
| | eval hidden_s = sr | |
| | rename emp_no as e | |
| | where e < 10050 | |
| | stats c = count(hidden_s) |
or some other line breaking approach.
| } | ||
|
|
||
| private PhysicalPlan optimizedPlan(PhysicalPlan plan, SearchStats searchStats) { | ||
| // System.out.println("* Physical Before\n" + plan); |
Related and potentially unexpected:
, but dropping argument from which makes sense, the latter counting the groups and former the values within the group (and |
|
alex-spies
left a comment
There was a problem hiding this comment.
Generally LGTM.
The only thing worrying me is that we never really test the pushed down filter against Lucene; there may be some unforeseen weirdnesses that lead to unexpected exceptions.
| @@ -356,9 +366,21 @@ private Tuple<List<Attribute>, List<Stat>> pushableStats(AggregateExec aggregate | |||
| Expression child = as.child(); | |||
There was a problem hiding this comment.
nit: the lambda passed to computeIfAbsent is beginning to become a bit hard to read (I have trouble figuring out what exactly the lambda is trying to achieve). Consider factoring this into a well-named helper method or adding a comment.
...in/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizer.java
Show resolved
Hide resolved
...ql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
Outdated
Show resolved
Hide resolved
| throw new EsqlIllegalArgumentException("EsStatsQuery should only occur against a Lucene backend"); | ||
| } | ||
| EsPhysicalOperationProviders esProvider = (EsPhysicalOperationProviders) physicalOperationProviders; | ||
| if (statsQuery.stats().size() > 1) { |
There was a problem hiding this comment.
Below we depend on having exactly one stat, so we should check this to avoid harder to find out of bounds exceptions.
| if (statsQuery.stats().size() > 1) { | |
| if ((statsQuery.stats().size() == 1) == false) { |
...ql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
Show resolved
Hide resolved
...ql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
Outdated
Show resolved
Hide resolved
|
|
||
| public class TestLocalPhysicalPlanOptimizer extends LocalPhysicalPlanOptimizer { | ||
|
|
||
| private final boolean esRules; |
There was a problem hiding this comment.
nit: the variable name used in the rules method is clearer (needed to look this up to understand what's going on), consider renaming:
| private final boolean esRules; | |
| private final boolean optimizeForEsSource; |
...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java
Show resolved
Hide resolved
Scratch that, I forgot that our csv tests are also being run against full server instances. So the tests are all good, of course :) |
Thanks Nhat, I've extended SearchStats to include a isSingleValue method which uses a similar approach to #80730 |
Great comment as always @astefan - I've incorporated it into the PR. |
Use the LuceneCountOperator also for ungrouped count(field) queries
Fix #99840