Skip to content

[FEA] Filter join probe table rows that contain nulls when nulls are not equal #9151

@jrhemstad

Description

@jrhemstad

Is your feature request related to a problem? Please describe.

In the hash join implementation, we allow controlling the behavior of whether two null elements are considered equal. If nulls are not equal, then two rows that contain nulls can never be considered equal. We exploit this in the build phase of the hash join by constructing a bitmask from ANDing the bitmasks from all the input columns and using that "row bitmask" to filter out any rows that contain a null element from being inserted into the hash table.

When it comes time to probe the hash table, we do not currently take advantage of this some optimization.

Describe the solution you'd like

When probing the hash map and nulls are considered not equal, we should build bitmask from ANDing all of the bitmasks of the probe table and only probe the map when a row does not contain any nulls.

Additional context

The hash join implementation is currently going a complete refactor to use the cuCollections static_multimap. To support the filtered insert, we added an insert_if function. I think we can also add a retrieve_if function to support the filtered probing.

Metadata

Metadata

Assignees

Labels

0 - BacklogIn queue waiting for assignmentPerformancePerformance related issuefeature requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions