Skip to content

[FEA] Story - Supporting row operators on nested types #10186

@devavret

Description

@devavret

There have been several requests to enable row operators on nested types. This issue is to track all related issues as a story.

There are three types of row operators we need to support (equality comparison ==, lexicographic comparison <, and hashing #) on two different nested types (LIST and STRUCT).

Status Issue Type Operation Notes
✔️ #8683 STRUCT == Solved by flattening
NULL_MIN and NULL_MAX still outstanding, see #11520 #8964 STRUCT ==, < Proposed solution by flattening #9452
waiting on cuDF list index support, see #8039 #8039 LIST (== + #) / (== + <) Requires == + # for hash groupby or == + < for sort groupby. Also a Spark req #10181
waiting on min/max struct in hash-groupby #8974 STRUCT < Individual action items being solved with flattening
✔️ #5890 LIST < Plain list sorting. Spark req #10184
awaiting cuDF troubleshooting, see #6784 #6784 LIST (== + #) / (== + <) drop_duplicates uses (== + <) right now but will be optimized to use (== + #) in #10030
✔️ #9119 STRUCT # Needs a struct_device_view
✔️ #10378 LIST # We have list hashing, Spark-compatible Murmur3 hashing for lists
Proposal in #11222 #10408 LIST of STRUCT < Different from groupby on list because here the list<struct> column is values, not keys

The plan

This will be supported using multiple PRs, first covering 1-table row comparators and hashing for nested types, then extending the row comparators with 2-table versions:

PR Column Type Operation Number of tables Dependencies Notes
#10164 Struct < 1 Refactor of existing functionality. Introduces new owning operator API
#10289 List + struct (arbitrary nesting) == 1 #10164, #10291 Works only for "sanitized list" and structs with nulls pushed down
#10641 List + struct (arbitrary nesting) # 1 #10289 Same requirements as #10289
#10883 List + struct (arbitrary nesting) == 2 #10289, Same requirements as #10289, also see #10508
#10730 Struct < 2 #10164 #10508
#11129 List < 1 #10164, #10291 Should work for struct of list but won't work for list of structs
#11292 List Spark-# 1
List of struct < 1 Proposal in #11222
Struct of list < 1 We expected #11129 would cover these types, but there is an unresolved issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions