Update `struct_minmax_util` to experimental row comparator by divyegala · Pull Request #13069 · rapidsai/cudf

divyegala · 2023-04-05T18:15:03Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

ttnghia · 2023-04-05T18:29:27Z

I've already worked on it: #10811.

It was stalled because of performance regression. Still need to wait for more performance optimization before continuing.

divyegala · 2023-04-06T15:30:38Z

Still need to wait for more performance optimization before continuing.

@ttnghia

@GregoryKimball and I discussed this and the performance regressions don't look significant to us. If you prefer, can you bring your PR up-to-date and we can run a set of microbenchmarks on it to determine if the minor cost is just a single run and amortized over multiple runs?

ttnghia · 2023-04-06T15:39:07Z

I would prefer to do it after all the optimization mentioned above came in. By doing so we can have the performance regression (if any) being minimized.

vyasr · 2023-04-13T00:15:29Z

I asked about moving forward on Nghia's PR right around when this one got made it looks like #10811 (comment). Can we discuss a concrete plan for moving forward here?

What benchmarks do we need to see? The other PR already has some reported.
What performance numbers do we need to hit?
Have we done any investigation at all to understand the causes of the regression?

cpp/src/reductions/struct_minmax_util.cuh

ttnghia · 2023-05-19T21:52:33Z

cpp/src/reductions/struct_minmax_util.cuh

+    : input_tview{cudf::table_view{{input_}}},
+      is_min_op(is_min_op_),
+      has_nulls{cudf::has_nested_nulls(cudf::table_view{{input_}})},
+      null_orders{std::vector<null_order>{DEFAULT_NULL_ORDER}},


I think this is the reason that fails many tests in this PR. In the meantime I don't have a clear way to fix it. Need to think further.

Only this test is failing in this PR now. Any ideas why?

StructReductionTest.StructReductionMinMaxWithNulls

I'm not sure. Can you print out the output of that test to see what it is?

[ RUN ] StructReductionTest.StructReductionMinMaxWithNulls /opt/conda/conda-bld/work/cpp/tests/utilities/column_utilities.cu:289: Failure Expected equality of these values: lhs_null_count Which is: 1 rhs_null_count Which is: 0 Google Test trace: /opt/conda/conda-bld/work/cpp/tests/reductions/reduction_tests.cpp:2565: <-- line of failure

That is the log, not the output. Add this to line 2565:

cudf::test::print(struct_result->view());

It looks like this was already a pre-conditon for min where top-level NULLs were pushed to AFTER

Yes. So the correct output is {null, null} which contains nulls only in the children, not top level.
In other word: The failed test has nulls excluded (nulls are larger than non-nulls) in the children level.

Ah I see, I don't understand why this is happening. Fault with code I wrote or lex comparator?

The previous approach flattens the input structs column into a table. The null order here will have size equal to number of table columns:

if (is_min_op) { null_orders = flattened_input->null_orders();

So we could just change the first null order element and the top level null order will be different from all children level null order.

In the new approach, you always have just one input column and always have the null order array having one element. The row comparator will then use that unique null order for all top level and children level.

I can think of only one solution to overcome this: continue flattening the input and modify the null order vector, then pass everything into the row comparator.

And that is only needed for min. For max, all null order values are the same so we don't modify the null order array.

ttnghia · 2023-05-19T21:56:56Z

@vyasr My potential optimization mentioned earlier has failed thus I don't block this anymore.

What benchmarks do we need to see? The other PR already has some reported.

Since this is touching struct min/max, we should run the benchmark on struct min/max on groupby scan and reduction. I ran such benchmark here so this PR just need to reran them: #10811.

What performance numbers do we need to hit?

We should not have significant performance regression with the benchmark above.

Have we done any investigation at all to understand the causes of the regression?

The cause of regression before was probably due to register usage as I investigated: #10811 (comment).

vyasr · 2023-05-23T22:19:53Z

@ttnghia can we close #10811 and use this PR instead, or would you prefer the reverse? Is there any difference, or code that needs to be ported from one to the other?

@divyegala let's repoint this PR at 23.08, don't think there's anything urgent to get in for this release here.

ttnghia · 2023-05-24T00:26:22Z

@vyasr sure this should close #10811.

vyasr · 2023-05-31T19:13:45Z

@divyegala I've retargeted and merged latest 23.08. There are a few comments to address, then I think this PR should be close.

…df into struct-minmax-row-operators

…ax-row-operators

divyegala · 2023-06-27T16:36:02Z

@vyasr @ttnghia this PR is up to date now. There are failing tests for scans/reductions when NULLs are present, although I'm not really sure how to diagnose why they are occurring. Any pointers?

…ax-row-operators

cpp/src/groupby/sort/group_scan_util.cuh

cpp/src/reductions/scan/scan_inclusive.cu

cpp/src/reductions/struct_minmax_util.cuh

vyasr

Perhaps I'm missing something: I see that you removed the old tests verifying that this code path failed loudly, but shouldn't we add new tests verifying the now supported behavior?

vyasr

Approving after some offline discussions. The main new feature added by this PR is nesting lists with structs (whereas previous only pure struct nesting was supported), but in other such PRs we have not added testing of the list case either. We're relying on existing tests of the comparator to validate those specific different cases.

divyegala · 2023-06-29T23:14:50Z

/merge

update to experimental

8c4d32e

divyegala requested a review from a team as a code owner April 5, 2023 18:15

divyegala requested review from robertmaynard and ttnghia April 5, 2023 18:15

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Apr 5, 2023

divyegala added feature request New feature or request non-breaking Non-breaking change labels Apr 5, 2023

divyegala added 2 commits April 5, 2023 14:15

Merge branch 'branch-23.06' into struct-minmax-row-operators

dd4f29b

fix all failing tests except one

c2bc287

GregoryKimball mentioned this pull request Apr 5, 2023

[FEA] Implement full support for nested types #11844

Closed

bdice changed the title ~~Update struct_minmax_uutil to experimental row comparator~~ Update struct_minmax_util to experimental row comparator May 12, 2023

ttnghia reviewed May 19, 2023

View reviewed changes

cpp/src/reductions/struct_minmax_util.cuh Show resolved Hide resolved

ttnghia reviewed May 19, 2023

View reviewed changes

cpp/src/reductions/struct_minmax_util.cuh Outdated Show resolved Hide resolved

ttnghia reviewed May 19, 2023

View reviewed changes

cpp/src/reductions/struct_minmax_util.cuh Outdated Show resolved Hide resolved

ttnghia reviewed May 19, 2023

View reviewed changes

vyasr mentioned this pull request May 31, 2023

Adopt experimental row comparator for struct min and max operators #10811

Closed

vyasr changed the base branch from branch-23.06 to branch-23.08 May 31, 2023 19:12

Merge branch 'branch-23.08' into struct-minmax-row-operators

15a15be

ttnghia mentioned this pull request Jun 12, 2023

[FEA] Fully support nested types in Spark SQL functions NVIDIA/spark-rapids#8550

Open

divyegala added 3 commits June 27, 2023 06:59

Merge branch 'struct-minmax-row-operators' of github.com:divyegala/cu…

b053bce

…df into struct-minmax-row-operators

Merge remote-tracking branch 'upstream/branch-23.08' into struct-minm…

9c11bb0

…ax-row-operators

address review comments

5fc3090

divyegala added 3 commits June 29, 2023 07:53

flatten input if is_min_op

5392ff0

Merge remote-tracking branch 'upstream/branch-23.08' into struct-minm…

cfda793

…ax-row-operators

remove structs tests for unsupported list flattening

dc3c1a8

ttnghia reviewed Jun 29, 2023

View reviewed changes

cpp/src/groupby/sort/group_scan_util.cuh Show resolved Hide resolved

ttnghia reviewed Jun 29, 2023

View reviewed changes

cpp/src/reductions/scan/scan_inclusive.cu Show resolved Hide resolved

ttnghia reviewed Jun 29, 2023

View reviewed changes

cpp/src/reductions/struct_minmax_util.cuh Outdated Show resolved Hide resolved

divyegala added 2 commits June 29, 2023 11:40

address review

23e7e35

review comment

3eca02b

ttnghia approved these changes Jun 29, 2023

View reviewed changes

ttnghia mentioned this pull request Jun 29, 2023

Support nested structs for min/max aggregations in groupby and reduction NVIDIA/spark-rapids#8638

Merged

vyasr requested changes Jun 29, 2023

View reviewed changes

vyasr approved these changes Jun 29, 2023

View reviewed changes

rapids-bot bot merged commit e7a1448 into rapidsai:branch-23.08 Jun 29, 2023

GregoryKimball assigned divyegala Jul 17, 2023

Conversation

divyegala commented Apr 5, 2023

Description

Checklist

Uh oh!

ttnghia commented Apr 5, 2023

Uh oh!

divyegala commented Apr 6, 2023

Uh oh!

ttnghia commented Apr 6, 2023

Uh oh!

vyasr commented Apr 13, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ttnghia Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ttnghia Jun 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ttnghia commented May 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vyasr commented May 23, 2023

Uh oh!

ttnghia commented May 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vyasr commented May 31, 2023

Uh oh!

divyegala commented Jun 27, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vyasr left a comment

Choose a reason for hiding this comment

Uh oh!

vyasr left a comment

Choose a reason for hiding this comment

Uh oh!

divyegala commented Jun 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ttnghia Jun 27, 2023 •

edited

Loading

ttnghia Jun 28, 2023 •

edited

Loading

ttnghia commented May 19, 2023 •

edited

Loading

ttnghia commented May 24, 2023 •

edited

Loading