fix(query): return semantic error for invalid grouping()#19594
fix(query): return semantic error for invalid grouping()#19594sundy-li wants to merge 4 commits intodatabendlabs:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 31daed9aad
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
The current fix is still incomplete for invalid
That means queries like Please extend the validation/rewrite so the non-rewritten clauses are covered as well (at least |
|
Verified the fix is still incomplete outside the aggregate-rewrite paths.
Please extend the semantic validation/rewrite to the non-rewritten clauses and add regression coverage for cases like:
|
sundy-li
left a comment
There was a problem hiding this comment.
The WHERE/QUALIFY fix is in place, but invalid raw GROUPING() is still not rejected in JOIN ... ON. Join predicates use a separate binder path and never hit AggregateRewriter, so this still needs one more semantic check before the PR is ready.
|
SELECT count()
FROM students s1
JOIN students s2 ON GROUPING() = 0
GROUP BY s1.course;still leave a raw |
sundy-li
left a comment
There was a problem hiding this comment.
Two blockers remain:
-
src/query/sql/src/planner/binder/qualify.rsonly checksis_raw_grouping_function().QUALIFYresolves select-list aliases afteranalyze_aggregate_select(), so an alias likeSELECT GROUPING(course) AS g, count() OVER () FROM students GROUP BY GROUPING SETS ((course), ()) QUALIFY g = 0bindsgto the rewrittengrouping<...>(_grouping_id)form and bypasses the new guard. The helper already saysQualify clause can't contain grouping functions, but this path still slips through. -
src/query/sql/src/planner/binder/bind_table_reference/bind_join.rsis still missing the same validation for join predicates.check_join_allowed_scalar_expr()only rejects aggregate/window/async nodes, and join conditions never run throughAggregateRewriter, so invalidGROUPING()calls inJOIN ... ONstill avoid the intended semantic error path.
|
A blocking path is still uncovered:
SELECT count()
FROM students s1
JOIN students s2 ON GROUPING() = 0
GROUP BY s1.course;still plans successfully instead of returning the semantic error this PR is meant to enforce. I verified that locally by temporarily extending the new semantic test with that case; |
|
Two blockers still remain in the current tip:
|
sundy-li
left a comment
There was a problem hiding this comment.
Blocking issue:
JOIN ... ON still accepts raw GROUPING() calls. The new checks cover WHERE and QUALIFY, but src/query/sql/src/planner/binder/bind_table_reference/bind_join.rs still validates join predicates with check_join_allowed_scalar_expr(), which only rejects aggregate/window/async nodes. Join predicates never go through AggregateRewriter, so a query like:
SELECT count()
FROM students s1
JOIN students s2 ON GROUPING() = 0
GROUP BY s1.course;still leaves a raw FunctionCall(grouping) in the join condition instead of returning the semantic error this PR is meant to enforce. This path needs the same kind of guard as WHERE, plus regression coverage for the join case.
|
I found two remaining gaps in the current diff:
SELECT GROUPING(course) AS g, count() OVER () still binds to the already-rewritten
SELECT count() still leave a raw |
|
I don't think this is complete yet. The new checks only cover Two concrete gaps from the current code:
Because |
|
Review notes from the current head:
These two paths still need coverage before this should pass review. |
🤖 CI Job Analysis (Retry 1)
📊 Summary
❌ NO RETRY NEEDEDAll failures appear to be code/test issues requiring manual fixes. 🔍 Job Details
🤖 AboutAutomated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed). |
|
Blocking issue:
The repo already supports
|
|
Blocking issue:
This is a behavior regression for alias-expanded I think the fix here needs to distinguish raw/unrewritten |
|
Updated in 4b6a979. Resolved items from review:
Fresh verification on this commit:
|
| aliases, | ||
| ); | ||
| let (mut scalar, _) = scalar_binder.bind(qualify)?; | ||
| let mut aggregate_rewriter = AggregateRewriter::new(bind_context, self.metadata.clone()); |
There was a problem hiding this comment.
Using AggregateRewriter here rewrites ordinary aggregate expressions too, not just grouping(). That means bind_qualify() no longer sees ScalarExpr::AggregateFunction for queries like SELECT count() OVER () FROM t QUALIFY count() > 0, so the existing Qualify clause must not contain aggregate functions check is bypassed and the query can plan through the aggregate path. We need a narrower rewrite here that only handles grouping() or another way to preserve the aggregate rejection.
| aliases, | ||
| ); | ||
| let (mut scalar, _) = scalar_binder.bind(qualify)?; | ||
| let mut aggregate_rewriter = AggregateRewriter::new(bind_context, self.metadata.clone()); |
There was a problem hiding this comment.
Running AggregateRewriter here looks unsafe. bind_qualify() still relies on seeing ScalarExpr::AggregateFunction at lines 74-81 to reject aggregates in QUALIFY, but this rewrite turns them into bound columns before that check runs. That means queries like SELECT 1 FROM t QUALIFY sum(x) > 0 can start planning successfully instead of raising Qualify clause must not contain aggregate functions. Can we keep the grouping() fix without rewriting general aggregates before the existing semantic check?
|
Marking this PR as changes requested from the agent side. The new |
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
GROUPING()query panics during constant folding instead of returning semantic error #19554groupingscalar before aggregate rewriteGROUPING()calls with a semantic error instead of letting them reach the dummy implementationGROUPING()usageTests
Type of change
Validation
cargo test -p databend-query --test it sql::planner::semantic::type_check::test_invalid_grouping_returns_semantic_error -- --exactcargo test -p databend-query --test it sql::planner::semantic::type_check -- --nocapturecargo fmt --all --checkcargo clippy -p databend-common-expression -p databend-common-sql --lib -- -D warningsAdded sqllogictest coverage in
tests/sqllogictests/suites/duckdb/sql/aggregate/group/group_by_grouping_sets.test, but did not run the standalone sqllogictest harness locally.This change is