List element Equality comparator#10289
List element Equality comparator#10289rapids-bot[bot] merged 129 commits intorapidsai:branch-22.06from
Conversation
Many iterations already happened. I just realized late that I should commit
sliced no longer works
New verticalization code that includes list all the way up, skipping structs that were included
vyasr
left a comment
There was a problem hiding this comment.
I'm pretty satisfied with this PR at this stage. My remaining suggestions and questions aren't really blocking, so I'll probably approve this in the next round of review, but some of my questions may spawn work for future PRs (e.g. the [lists|structs]_column_device_view question) if we think they're worthwhile.
cpp/benchmarks/reduction/rank.cpp
Outdated
|
|
||
| data_profile table_data_profile; | ||
| table_data_profile.set_distribution_params(dtype, distribution_id::UNIFORM, 0, 5); | ||
| table_data_profile.set_null_frequency((include_nulls) ? 0.1 : 0.0); |
There was a problem hiding this comment.
It might balloon benchmark times too much, so feel free to decline, but this seems like a benchmark where it might be quite interesting to see how performance changes with different amounts of nulls (i.e. parametrizing this).
There was a problem hiding this comment.
Changes the axis from include_nulls to null_frequency. Here's the results:


There's a big jump from no nulls to 10% nulls because for 0 nulls, a portion of the code is inactive. Adding nulls seems to help flat column because it doesn't have to load and check the actual value.
For list, I think we don't see the early return benefits until after 70% nulls.
There was a problem hiding this comment.
Right, the 0 nulls case is qualitatively different. For the rest I'm guessing that it's some balance between less work and more divergence? The cost of a thread idling for a list is worse than for scalars and should get worse the larger the lists since in theory the amount of idle time is potentially only bounded by the largest list. I don't think there's anything actionable here, but good to see
vyasr
left a comment
There was a problem hiding this comment.
I can't seem to comment on the start indexes discussion anymore, but the commit you linked shows a good example of what you're aiming for so I'm going to go ahead and resolve that discussion. There are a couple of small outstanding tasks (documenting the safe template parameter and making a decision on the curr_col/temp_col/prev_col naming) but otherwise LGTM! Really awesome work here.
| void const* data, | ||
| bitmask_type const* null_mask, | ||
| size_type offset) | ||
| CUDF_HOST_DEVICE column_device_view_base(data_type type, |
There was a problem hiding this comment.
strictly speaking, this macro is really only necessary in a .hpp file that is expected to work in both host and device code TUs.
There was a problem hiding this comment.
Change made as part of this feedback #10289 (comment). I can revert the macro changes made to all .cuh headers.
There was a problem hiding this comment.
strictly speaking, this macro is really only necessary in a .hpp file that is expected to work in both host and device code TUs.
I agree that it's not necessary in cuh. I made this suggestion because not everyone will think about that and people will invariably copy-paste, so uniformly using CUDF_HOST_DEVICE helps reduce the chance of future errors and leaves one less thing for developers to think about. It's a minor suggestion though, if you prefer to use it more surgically that's OK with me.
|
rerun tests |
|
@gpucibot merge |
|
The merge is blocked by needing an ops review because of the one-line change to the conda recipe. I've made that request. |
This PR implements equality comparator for LIST columns. This only supports "self" comparison for now, meaning the two rows to be compared should belong to the same table. A comparator that works on rows of two different tables will be implemented in another PR.
This works only on "sanitized" list columns. See #10291 for details.
This will partially support #10186.