Update contains_table to experimental row hasher and equality comparator#13119
Conversation
benchmarks |
cpp/src/search/contains_table.cu
Outdated
| // | ||
| // TODO: We should unify these code paths in the future when performance regression is no longer | ||
| // happening. | ||
| return contains_impl(haystack, needles, compare_nulls, compare_nans, stream, mr); |
There was a problem hiding this comment.
If that function is called just once, please move its content here and remove it completely.
There was a problem hiding this comment.
Agreed, I don't see a need for a separate impl function here.
|
Thank you @divyegala for suggesting this change. We can just barely see a performance difference in the automated microbenchmarks - about 1.5% slower in the worst case. The difference is close to the noise level and acceptable in my opinion. |
|
@divyegala can the 1.5% worst-case regression that @GregoryKimball found be attributed to any of the code changes? |
@abellina Yes, I would say that if it's not just noise and we consistently see a 1.5% worst-case regression then it's because of the code changes. That's where your help with the NDS query would be a good insight to have. |
|
@divyegala my question was if you could explain why there would be a regression. e.g. would you expect a regression if there are nulls, certain data types, specific cuDF expressions. |
|
@divyegala I do not see regressions with this patch vs the 23.06 nightly. I ran it 5 times each, and then compared both sets of results:
|
|
Based on what I see above, this PR should be ready for review then? No real performance concerns here IIUC. |
|
@vyasr yes, it's ready for review |
vyasr
left a comment
There was a problem hiding this comment.
Approving assuming Nghia's comment about combining functions is addressed.
cpp/src/search/contains_table.cu
Outdated
| // | ||
| // TODO: We should unify these code paths in the future when performance regression is no longer | ||
| // happening. | ||
| return contains_impl(haystack, needles, compare_nulls, compare_nans, stream, mr); |
There was a problem hiding this comment.
Agreed, I don't see a need for a separate impl function here.
ttnghia
left a comment
There was a problem hiding this comment.
Since some code was removed, there are some headers such as struct utilities are no longer needed. Please clean them up too.
karthikeyann
left a comment
There was a problem hiding this comment.
are there any need for additional unit tests for this change?
@karthikeyann this functionality already existed, this PR is just essentially a refactor |
|
/merge |


This is a part of #11844
Benchmarks: #13119 (comment)
Checklist