List element Equality comparator by devavret · Pull Request #10289 · rapidsai/cudf

devavret · 2022-02-14T23:02:28Z

This PR implements equality comparator for LIST columns. This only supports "self" comparison for now, meaning the two rows to be compared should belong to the same table. A comparator that works on rows of two different tables will be implemented in another PR.

This works only on "sanitized" list columns. See #10291 for details.

This will partially support #10186.

Many iterations already happened. I just realized late that I should commit

sliced no longer works

New verticalization code that includes list all the way up, skipping structs that were included

vyasr

I'm pretty satisfied with this PR at this stage. My remaining suggestions and questions aren't really blocking, so I'll probably approve this in the next round of review, but some of my questions may spawn work for future PRs (e.g. the [lists|structs]_column_device_view question) if we think they're worthwhile.

cpp/include/cudf/detail/utilities/column.hpp

cpp/include/cudf/detail/iterator.cuh

cpp/include/cudf/lists/lists_column_device_view.cuh

cpp/src/table/row_operators.cu

cpp/include/cudf/structs/structs_column_device_view.cuh

vyasr · 2022-03-31T00:56:31Z

cpp/benchmarks/reduction/rank.cpp

+
+  data_profile table_data_profile;
+  table_data_profile.set_distribution_params(dtype, distribution_id::UNIFORM, 0, 5);
+  table_data_profile.set_null_frequency((include_nulls) ? 0.1 : 0.0);


It might balloon benchmark times too much, so feel free to decline, but this seems like a benchmark where it might be quite interesting to see how performance changes with different amounts of nulls (i.e. parametrizing this).

Changes the axis from include_nulls to null_frequency. Here's the results:

There's a big jump from no nulls to 10% nulls because for 0 nulls, a portion of the code is inactive. Adding nulls seems to help flat column because it doesn't have to load and check the actual value.
For list, I think we don't see the early return benefits until after 70% nulls.

Right, the 0 nulls case is qualitatively different. For the rest I'm guessing that it's some balance between less work and more divergence? The cost of a thread idling for a list is worse than for scalars and should get worse the larger the lists since in theory the amount of idle time is potentially only bounded by the largest list. I don't think there's anything actionable here, but good to see

cpp/include/cudf/table/experimental/row_operators.cuh

vyasr

I can't seem to comment on the start indexes discussion anymore, but the commit you linked shows a good example of what you're aiming for so I'm going to go ahead and resolve that discussion. There are a couple of small outstanding tasks (documenting the safe template parameter and making a decision on the curr_col/temp_col/prev_col naming) but otherwise LGTM! Really awesome work here.

cpp/include/cudf/detail/iterator.cuh

jrhemstad · 2022-04-01T14:14:28Z

cpp/include/cudf/column/column_device_view.cuh

-                          void const* data,
-                          bitmask_type const* null_mask,
-                          size_type offset)
+  CUDF_HOST_DEVICE column_device_view_base(data_type type,


strictly speaking, this macro is really only necessary in a .hpp file that is expected to work in both host and device code TUs.

Change made as part of this feedback #10289 (comment). I can revert the macro changes made to all .cuh headers.

strictly speaking, this macro is really only necessary in a .hpp file that is expected to work in both host and device code TUs.

I agree that it's not necessary in cuh. I made this suggestion because not everyone will think about that and people will invariably copy-paste, so uniformly using CUDF_HOST_DEVICE helps reduce the chance of future errors and leaves one less thing for developers to think about. It's a minor suggestion though, if you prefer to use it more surgically that's OK with me.

cpp/include/cudf/column/column_device_view.cuh

devavret · 2022-04-12T19:33:28Z

rerun tests

vyasr · 2022-04-12T23:24:31Z

@gpucibot merge

vyasr · 2022-04-12T23:26:46Z

The merge is blocked by needing an ops review because of the one-line change to the conda recipe. I've made that request.

devavret added 21 commits August 26, 2021 19:28

First commit

933c974

Many iterations already happened. I just realized late that I should commit

testing and profiling deep single hierarchy struct

a1636e5

Merge branch 'branch-22.02' into struct-row-comp

d59f54c

Merge branch 'branch-22.02' into struct-row-comp

765dd8d

Make the sandboxed test compile again

3d21daf

Update my row_comparator with nullate

9f32e6b

Merge branch 'branch-22.02' into struct-row-comp

53d3c90

Basic verticalization utility and experimental namespace

022e2a4

clean up most of row operators that I didn't change.

7fef643

Sliced column test

930d8de

column order and null precendence support

0ecc4f8

Manually managed stack

ff36d2d

New depth based method to avoid superimpose nulls

cd0f938

sliced no longer works

Put sort2 impl in separate TU

7b8e060

Merge branch 'branch-22.04' into struct-row-comp

25eb237

Basic working list == comp

c8e527e

Merge branch 'branch-22.04' into list-row-eq

eb87ed7

deeper list test

cc1584d

benchmark list ==

925481a

small cleanups

b2b41c7

Merge branch 'branch-22.04' into struct-row-comp

d2937cf

devavret added the DO NOT MERGE Hold off on merging; see PR for details label Feb 14, 2022

github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Feb 14, 2022

devavret mentioned this pull request Feb 15, 2022

[FEA] Nulls pushdowns for LIST columns #10291

Closed

devavret added 5 commits February 16, 2022 03:13

Move verticalization code to row_comparator.cpp

d55c9c7

Merge branch 'struct-row-comp' into list-row-eq

b7cdfe0

Use regular type dispatcher with new id type map

8309151

Early return from unequal leaf elements

8717b9c

Combined struct and list equality operator

21df6cf

New verticalization code that includes list all the way up, skipping structs that were included

spell check

0f768ac

vyasr requested changes Mar 31, 2022

View reviewed changes

devavret added 8 commits March 31, 2022 14:36

Change composition to private inheritance

92c1ff5

Replace __host__ __device__ with macro

4c0e7fa

Add more null frequencies to benchmark

75104bb

Templatize make_validity_iterator

1e1053b

Increase testing for null frequency

bcfe91b

curr_col -> temp_col

981438d

element_range_comparator -> column_comparator

5bbf18e

cleaner column_view conversion

8e18d66

vyasr approved these changes Mar 31, 2022

View reviewed changes

cpp/include/cudf/detail/iterator.cuh Show resolved Hide resolved

cpp/include/cudf/detail/iterator.cuh Show resolved Hide resolved

devavret added 2 commits April 1, 2022 19:02

delete copy ctor and assignment operator

75eaed4

iterator docs

be98357

jrhemstad reviewed Apr 1, 2022

View reviewed changes

hyperbolic2346 approved these changes Apr 1, 2022

View reviewed changes

devavret added 4 commits April 8, 2022 18:59

Handle empty struct in list equality

f4c509a

Handle empty list (without offsets)

d1386cf

Merge branch 'branch-22.06' into list-row-eq

6aef29f

Merge branch 'branch-22.06' into list-row-eq

3cc1159

jrhemstad reviewed Apr 11, 2022

View reviewed changes

cpp/include/cudf/column/column_device_view.cuh Outdated Show resolved Hide resolved

Column_device_view review changes

8078e3c

jrhemstad approved these changes Apr 12, 2022

View reviewed changes

vyasr requested a review from a team April 12, 2022 23:26

vyasr mentioned this pull request Apr 12, 2022

Add row hasher with nested column support #10641

Merged

jjacobelli approved these changes Apr 13, 2022

View reviewed changes

rapids-bot bot merged commit 0ea6f8e into rapidsai:branch-22.06 Apr 13, 2022

GregoryKimball mentioned this pull request Oct 3, 2022

[FEA] Implement full support for nested types #11844

Closed

Conversation

devavret commented Feb 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vyasr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vyasr Mar 31, 2022

Choose a reason for hiding this comment

Uh oh!

devavret Mar 31, 2022

Choose a reason for hiding this comment

Uh oh!

vyasr Mar 31, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vyasr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jrhemstad Apr 1, 2022

Choose a reason for hiding this comment

Uh oh!

devavret Apr 1, 2022

Choose a reason for hiding this comment

Uh oh!

vyasr Apr 1, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devavret commented Apr 12, 2022

Uh oh!

vyasr commented Apr 12, 2022

Uh oh!

vyasr commented Apr 12, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

devavret commented Feb 14, 2022 •

edited

Loading