There have been several requests to enable row operators on nested types. This issue is to track all related issues as a story.
There are three types of row operators we need to support (equality comparison ==, lexicographic comparison <, and hashing #) on two different nested types (LIST and STRUCT).
| Status |
Issue |
Type |
Operation |
Notes |
| ✔️ |
#8683 |
STRUCT |
== |
Solved by flattening |
NULL_MIN and NULL_MAX still outstanding, see #11520 |
#8964 |
STRUCT |
==, < |
Proposed solution by flattening #9452 |
| waiting on cuDF list index support, see #8039 |
#8039 |
LIST |
(== + #) / (== + <) |
Requires == + # for hash groupby or == + < for sort groupby. Also a Spark req #10181 |
| waiting on min/max struct in hash-groupby |
#8974 |
STRUCT |
< |
Individual action items being solved with flattening |
| ✔️ |
#5890 |
LIST |
< |
Plain list sorting. Spark req #10184 |
| awaiting cuDF troubleshooting, see #6784 |
#6784 |
LIST |
(== + #) / (== + <) |
drop_duplicates uses (== + <) right now but will be optimized to use (== + #) in #10030 |
| ✔️ |
#9119 |
STRUCT |
# |
Needs a struct_device_view |
| ✔️ |
#10378 |
LIST |
# |
We have list hashing, Spark-compatible Murmur3 hashing for lists |
| Proposal in #11222 |
#10408 |
LIST of STRUCT |
< |
Different from groupby on list because here the list<struct> column is values, not keys |
The plan
This will be supported using multiple PRs, first covering 1-table row comparators and hashing for nested types, then extending the row comparators with 2-table versions:
| PR |
Column Type |
Operation |
Number of tables |
Dependencies |
Notes |
| #10164 |
Struct |
< |
1 |
|
Refactor of existing functionality. Introduces new owning operator API |
| #10289 |
List + struct (arbitrary nesting) |
== |
1 |
#10164, #10291 |
Works only for "sanitized list" and structs with nulls pushed down |
| #10641 |
List + struct (arbitrary nesting) |
# |
1 |
#10289 |
Same requirements as #10289 |
| #10883 |
List + struct (arbitrary nesting) |
== |
2 |
#10289, |
Same requirements as #10289, also see #10508 |
| #10730 |
Struct |
< |
2 |
#10164 |
#10508 |
| #11129 |
List |
< |
1 |
#10164, #10291 |
Should work for struct of list but won't work for list of structs |
| #11292 |
List |
Spark-# |
1 |
|
|
|
List of struct |
< |
1 |
|
Proposal in #11222 |
|
Struct of list |
< |
1 |
|
We expected #11129 would cover these types, but there is an unresolved issue |
There have been several requests to enable row operators on nested types. This issue is to track all related issues as a story.
There are three types of row operators we need to support (equality comparison
==, lexicographic comparison<, and hashing#) on two different nested types (LISTandSTRUCT).==NULL_MINandNULL_MAXstill outstanding, see #11520==,<==+#) / (==+<)==+#for hash groupby or==+<for sort groupby. Also a Spark req #10181<<==+#) / (==+<)drop_duplicatesuses (==+<) right now but will be optimized to use (==+#) in #10030#struct_device_view#<list<struct>column is values, not keysThe plan
This will be supported using multiple PRs, first covering 1-table row comparators and hashing for nested types, then extending the row comparators with 2-table versions:
<==#==<<#<<