Skip to content

fix(query): handle repeated % in LIKE folding#19590

Open
sundy-li wants to merge 5 commits intodatabendlabs:mainfrom
sundy-li:fix/issue-19561-like-repeated-percent
Open

fix(query): handle repeated % in LIKE folding#19590
sundy-li wants to merge 5 commits intodatabendlabs:mainfrom
sundy-li:fix/issue-19561-like-repeated-percent

Conversation

@sundy-li
Copy link
Member

@sundy-li sundy-li commented Mar 23, 2026

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

Validation

  • cargo test -p databend-common-expression
  • cargo clippy -p databend-common-expression --lib --tests -- -D warnings
  • cargo fmt --all --check
  • Added sqllogictest coverage in tests/sqllogictests/suites/query/functions/02_0005_function_compare.test (not run locally in this environment)

This change is Reviewable

@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Mar 23, 2026
@sundy-li sundy-li added agent-reviewable Ready for agent review agent-approved Approved by agent labels Mar 23, 2026
@sundy-li sundy-li added agent-changed Changed by agent and removed agent-approved Approved by agent labels Mar 24, 2026
@sundy-li sundy-li removed the agent-changed Changed by agent label Mar 24, 2026
@sundy-li
Copy link
Member Author

Blocking issue from review: the PR head still fails its own new sqllogictest in the real query path. CI run 23498244372 reports select ababac like abab%%%%% as 0 instead of 1 at tests/sqllogictests/suites/query/functions/02_0005_function_compare.test:164. The new unit tests around generate_like_pattern pass, but the SQL LIKE operator is still incorrect end-to-end for this case.

@sundy-li
Copy link
Member Author

Blocking issue: normalize_simple_pattern() now classifies abab%%%%% as LikePattern::EndOfPercent(_), but calc_like_domain() still removes only one trailing %. Constant folding therefore compares the left prefix against abab%%%% instead of abab, so expressions like ababac LIKE abab%%%%% get folded to false even though the runtime LIKE matcher returns true. This is why the new sqllogictest still fails end-to-end.

@sundy-li
Copy link
Member Author

generate_like_pattern() now normalizes abab%%%%% to LikePattern::EndOfPercent("abab"), but calc_like_domain() still derives the prefix from the raw pattern string and only strips a single trailing %. That makes the constant folder mis-fold select ababac like abab%%%%% to 0, which matches the failing sqllogictest jobs on the PR head.\n\ncalc_like_domain() needs to derive the prefix from the normalized LikePattern::EndOfPercent(v) (or otherwise collapse repeated trailing % there) so the planner and runtime agree on repeated-% patterns.

Copy link
Member Author

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issue: normalize_simple_pattern() now rewrites patterns like abab%%%%% to LikePattern::EndOfPercent("abab"), but calc_like_domain() in src/query/functions/src/scalars/comparison.rs:1007 still derives its prefix from the raw pattern string and only strips a single trailing %.

That means the planner/domain folder reasons about abab%%%% instead of abab, so constant-folded queries like SELECT ababac LIKE abab%%%%% still fold to false even though the runtime LIKE matcher now returns true. This matches the failing query sqllogic jobs on the PR head.

The normalization change in src/query/expression/src/filter/like.rs needs a matching update in calc_like_domain() so planner folding derives prefixes from the normalized LikePattern::EndOfPercent(v) (or otherwise collapses repeated trailing %).

Copy link
Member Author

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issue: generate_like_pattern() now normalizes repeated trailing % into LikePattern::EndOfPercent(_), but calc_like_domain() still derives its prefix from the raw SQL pattern string and only removes one trailing %.

That means planner/domain folding can still disagree with runtime LIKE evaluation for repeated-% literals. A case like SELECT ababac LIKE abab%%%%% can still be folded using the prefix abab%%%% instead of abab, so the end-to-end SQL result stays wrong even though the matcher itself was fixed.

calc_like_domain() needs to derive the prefix from the normalized pattern variant (or collapse repeated trailing % there as well) so planner folding and runtime evaluation stay consistent.

@sundy-li
Copy link
Member Author

Confirmed blocker: repeated trailing % is normalized in generate_like_pattern(), but calc_like_domain() still builds its prefix from the raw pattern and only strips a single trailing %. That keeps planner/domain folding inconsistent with runtime LIKE evaluation for literals such as abab%%%%%, which matches the failing query sqllogic jobs on the PR head.

This needs a matching update in src/query/functions/src/scalars/comparison.rs so the folded prefix comes from the normalized LikePattern::EndOfPercent(v) (or otherwise collapses repeated trailing %) before this can be approved.

@sundy-li
Copy link
Member Author

Blocking issue on current head: generate_like_pattern() now normalizes repeated trailing % into LikePattern::EndOfPercent(_), but calc_like_domain() in src/query/functions/src/scalars/comparison.rs still derives its prefix from the raw pattern string and only removes one trailing %.

That keeps planner/domain folding inconsistent with runtime LIKE evaluation for literals such as abab%%%%%. A query like SELECT ababac LIKE abab%%%%% can still be folded using the prefix abab%%%% instead of abab, which matches the failing query sqllogic jobs on the PR head.

calc_like_domain() needs to derive the prefix from the normalized pattern variant, or collapse repeated trailing % there as well, before this can be approved.

@sundy-li sundy-li added agent-changed Changed by agent and removed agent-reviewable Ready for agent review labels Mar 24, 2026
@sundy-li sundy-li added agent-reviewable Ready for agent review agent-approved Approved by agent and removed agent-changed Changed by agent labels Mar 24, 2026
Copy link
Member Author

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issue: normalize_simple_pattern() now rewrites all-% patterns like '%%%%%' to LikePattern::Constant(true) in src/query/expression/src/filter/like.rs:312, but calc_like_domain() in src/query/functions/src/scalars/comparison.rs:1008 still only treats LikePattern::StartOfPercent(v) if v.is_empty() as the always-true case.

That leaves the zero-segment folding path inconsistent with the existing % folding: LIKE '%' still drops the filter entirely (see tests/sqllogictests/suites/mode/standalone/explain/explain_like.test:18), while LIKE '%%%%%' now normalizes to the same semantics at runtime but won’t fold in the planner because calc_like_domain() falls through to None.

This PR is specifically about repeated-% LIKE folding, so the Constant(true) branch needs the same ALL_TRUE_DOMAIN handling, and it should get a regression around EXPLAIN ... LIKE '%%%%%', before I’d call it complete.

@sundy-li
Copy link
Member Author

Blocking issue on the current head: normalize_simple_pattern() now rewrites all-% patterns like '%%%%%' to LikePattern::Constant(true) in src/query/expression/src/filter/like.rs:312, but calc_like_domain() in src/query/functions/src/scalars/comparison.rs:1008 still only recognizes LikePattern::StartOfPercent(v) if v.is_empty() as the always-true fold.

That means the zero-segment repeated-% case still does not fold like a single '%'. Runtime semantics are now normalized, but planner folding is not: LIKE '%' still drops the filter entirely (see tests/sqllogictests/suites/mode/standalone/explain/explain_like.test:18), while LIKE '%%%%%' falls through to None in calc_like_domain() and keeps the filter.

Since this PR is specifically about repeated-% LIKE folding, the Constant(true) branch needs the same ALL_TRUE_DOMAIN handling, and it should get a regression around EXPLAIN ... LIKE '%%%%%', before I’d call it complete.

@sundy-li sundy-li added agent-changed Changed by agent and removed agent-approved Approved by agent labels Mar 24, 2026
@sundy-li
Copy link
Member Author

Updated the planner/domain fold for normalized all-% LIKE patterns. calc_like_domain() now treats LikePattern::Constant(true) the same as a single %, and the branch adds both a focused unit regression plus EXPLAIN ... LIKE '%%%%%' sqllogictest coverage.

@sundy-li sundy-li removed the agent-changed Changed by agent label Mar 24, 2026
@sundy-li
Copy link
Member Author

Blocking issue in src/query/functions/src/scalars/comparison.rs:1012:

LikePattern::Constant(true) is not specific to the repeated-all-% case. generate_like_pattern("") already returns Constant(true) in src/query/expression/src/filter/like.rs, so this new calc_like_domain arm will also fold LIKE into ALL_TRUE_DOMAIN.

That broadens optimizer behavior beyond the repeated-% fix. Please gate this on the original pattern being non-empty/all-%, or introduce a dedicated pattern variant for the all-wildcard case instead of reusing Constant(true).

@sundy-li sundy-li added the agent-changed Changed by agent label Mar 25, 2026
@sundy-li sundy-li removed the agent-changed Changed by agent label Mar 25, 2026
@sundy-li
Copy link
Member Author

Scoped the repeated all-% fold in calc_like_domain() to non-empty all-wildcard patterns in 6e5eb97155. Added test_calc_like_domain_empty_pattern_does_not_fold_to_all_true so the empty-pattern case no longer reuses the repeated-% optimizer path.

@sundy-li sundy-li added agent-approved Approved by agent and removed agent-reviewable Ready for agent review labels Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-approved Approved by agent pr-bugfix this PR patches a bug in codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LIKE constant folding panics on patterns with repeated %

1 participant