Skip to content

Conversation

@b41sh
Copy link
Member

@b41sh b41sh commented Mar 14, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

  • Store max/min values for all the leaf columns of nested data types
  • Generate domain for nested data types and skip the block if the filter range does not match the domain

From local test data, it can effectively reduce the query time when the filter ranges do not match

MySQL root@127.0.0.1:default> create table tt (a array(int), b map(string, int), c tuple(int, string));
Query OK, 0 rows affected
Time: 0.223s
MySQL root@127.0.0.1:default> select count(*) from tt;
+----------+
| count(*) |
+----------+
| 100000   |
+----------+
1 row in set
Time: 0.087s
MySQL root@127.0.0.1:default> select * from tt where a[1] >= 100;
+-------------------+---------------+----------------+
| a                 | b             | c              |
+-------------------+---------------+----------------+
| [100,101,102,103] | {'ip':99999}  | (100009,'abc') |
| [100,101,102,103] | {'ip':100000} | (100010,'abc') |
+-------------------+---------------+----------------+
2 rows in set
Time: 0.705s
MySQL root@127.0.0.1:default> select * from tt where a[1] >= 200;
+---+---+---+
| a | b | c |
+---+---+---+
+---+---+---+
0 rows in set
Time: 0.118s
MySQL root@127.0.0.1:default> select * from tt where b['ip'] >= 99999;
+-------------------+---------------+----------------+
| a                 | b             | c              |
+-------------------+---------------+----------------+
| [100,101,102,103] | {'ip':99999}  | (100009,'abc') |
| [100,101,102,103] | {'ip':100000} | (100010,'abc') |
+-------------------+---------------+----------------+
2 rows in set
Time: 0.792s
MySQL root@127.0.0.1:default> select * from tt where b['ip'] > 100000;
+---+---+---+
| a | b | c |
+---+---+---+
+---+---+---+
0 rows in set
Time: 0.108s
MySQL root@127.0.0.1:default> select * from tt where c.1 >= 100009;
+-------------------+---------------+----------------+
| a                 | b             | c              |
+-------------------+---------------+----------------+
| [100,101,102,103] | {'ip':99999}  | (100009,'abc') |
| [100,101,102,103] | {'ip':100000} | (100010,'abc') |
+-------------------+---------------+----------------+
2 rows in set
Time: 0.763s
MySQL root@127.0.0.1:default> select * from tt where c.1 >= 200009;
+---+---+---+
| a | b | c |
+---+---+---+
+---+---+---+
0 rows in set
Time: 0.116s

Closes #10487

@b41sh b41sh requested review from andylokandy and sundy-li March 14, 2023 09:19
@vercel
Copy link

vercel bot commented Mar 14, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated
databend ⬜️ Ignored (Inspect) Visit Preview Mar 17, 2023 at 11:15AM (UTC)

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Mar 14, 2023
@b41sh b41sh force-pushed the feat-nested-domain branch 2 times, most recently from f4480a0 to 238cee6 Compare March 17, 2023 02:35
@b41sh b41sh force-pushed the feat-nested-domain branch from 238cee6 to 3f637cc Compare March 17, 2023 08:20
@b41sh b41sh force-pushed the feat-nested-domain branch from 3f637cc to aac0363 Compare March 17, 2023 11:15
@BohuTANG BohuTANG merged commit 2441484 into databendlabs:main Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: generate the Domain for the nested data types

4 participants