Skip to content

Refactor mixed_semi_join using cuco::static_set#16230

Merged
rapids-bot[bot] merged 36 commits intorapidsai:branch-24.10from
srinivasyadav18:mixed_semi_join_refactor
Sep 18, 2024
Merged

Refactor mixed_semi_join using cuco::static_set#16230
rapids-bot[bot] merged 36 commits intorapidsai:branch-24.10from
srinivasyadav18:mixed_semi_join_refactor

Conversation

@srinivasyadav18
Copy link
Copy Markdown
Contributor

@srinivasyadav18 srinivasyadav18 commented Jul 9, 2024

Description

This PR refactors mixed_semi_join by replacing cuco legacy static_map with latest static_set.
Contributes to #12261.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jul 9, 2024
@srinivasyadav18 srinivasyadav18 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Jul 9, 2024
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jul 11, 2024
@srinivasyadav18 srinivasyadav18 marked this pull request as ready for review July 18, 2024 02:05
@srinivasyadav18 srinivasyadav18 requested a review from a team as a code owner July 18, 2024 02:05
experimental::row::equality::preprocessed_table::create(probe, stream);
auto const row_comparator =
cudf::experimental::row::equality::two_table_comparator{preprocessed_probe, preprocessed_build};
cudf::experimental::row::equality::two_table_comparator{preprocessed_build, preprocessed_probe};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for this order change build and probe?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to dig into why but without this change the gtests start failing. Maybe @srinivasyadav18 can provide us a quick insight.

Copy link
Copy Markdown
Member

@mhaseeb123 mhaseeb123 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: We changed this order and added swap_tables=true at mixed_join_kernels_semi.cu:+72 (tagged you there) to fix the left and right directionality in the equality in single_expression_equality::operator(). Specifically, the single_expression_equality expects indices in {build, probe} order and we have also reset the hasher and equality functors for the cuco::static_set which swaps the internal left and right directions for {build, probe} hence us using swap_table=true.

I know it's quite confusing. 😅

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly due to a breaking change in NVIDIA/cuCollections#479 where we always put query keys on the left-hand side for comparison.

@ttnghia
Copy link
Copy Markdown
Contributor

ttnghia commented Aug 15, 2024

This PR refactors mixed_semi_join by replacing cuco legacy static_map with latest static_set.

Please provide benchmark showing what benefit this PR will bring to us.

@srinivasyadav18 srinivasyadav18 requested a review from a team as a code owner August 16, 2024 19:25
@mhaseeb123
Copy link
Copy Markdown
Member

/ok to test

@mhaseeb123 mhaseeb123 requested review from a team and mhaseeb123 and removed request for JayjeetAtGithub, brandon-b-miller, galipremsagar, jameslamb and mhaseeb123 September 10, 2024 23:39
Copy link
Copy Markdown
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor improvement otherwise looks great to me.

@mhaseeb123 Thank you for stepping in to help resolve the issue and finalize the PR! 🔥 🔥 🔥

experimental::row::equality::preprocessed_table::create(probe, stream);
auto const row_comparator =
cudf::experimental::row::equality::two_table_comparator{preprocessed_probe, preprocessed_build};
cudf::experimental::row::equality::two_table_comparator{preprocessed_build, preprocessed_probe};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly due to a breaking change in NVIDIA/cuCollections#479 where we always put query keys on the left-hand side for comparison.

@mhaseeb123 mhaseeb123 requested review from a team and JayjeetAtGithub September 16, 2024 17:53
@rapidsai rapidsai deleted a comment from srinivasyadav18 Sep 16, 2024
@mhaseeb123 mhaseeb123 added DO NOT MERGE Hold off on merging; see PR for details 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs Review Waiting for reviewer to review or respond DO NOT MERGE Hold off on merging; see PR for details labels Sep 17, 2024
@mhaseeb123
Copy link
Copy Markdown
Member

mhaseeb123 commented Sep 17, 2024

Performance results from Spark (spark-h). Thanks to @zpuller for running this. Queries of interest: q16 and q94 that use mixed_semi_join. @ttnghia @GregoryKimball

Regression alerts
-----------------

Speedup results
-----------------
query1: Previous (1881.6 ms) vs Current (1774.4 ms) Diff 107 E2E 1.06x
query2: Previous (1651.8 ms) vs Current (1693.4 ms) Diff -41 E2E 0.98x
query3: Previous (441.0 ms) vs Current (440.2 ms) Diff 0 E2E 1.00x
query4: Previous (8464.8 ms) vs Current (8578.4 ms) Diff -113 E2E 0.99x
query5: Previous (1917.6 ms) vs Current (2090.8 ms) Diff -173 E2E 0.92x
query6: Previous (1026.4 ms) vs Current (946.8 ms) Diff 79 E2E 1.08x
query7: Previous (2728.8 ms) vs Current (2871.4 ms) Diff -142 E2E 0.95x
query8: Previous (1070.8 ms) vs Current (1062.4 ms) Diff 8 E2E 1.01x
query9: Previous (3184.8 ms) vs Current (3115.6 ms) Diff 69 E2E 1.02x
query10: Previous (1529.4 ms) vs Current (1480.2 ms) Diff 49 E2E 1.03x
query11: Previous (4687.8 ms) vs Current (4840.4 ms) Diff -152 E2E 0.97x
query12: Previous (552.2 ms) vs Current (565.8 ms) Diff -13 E2E 0.98x
query13: Previous (1100.6 ms) vs Current (1098.0 ms) Diff 2 E2E 1.00x
query14_part1: Previous (5230.6 ms) vs Current (5320.0 ms) Diff -89 E2E 0.98x
query14_part2: Previous (4355.2 ms) vs Current (4310.8 ms) Diff 44 E2E 1.01x
query15: Previous (1063.8 ms) vs Current (1098.4 ms) Diff -34 E2E 0.97x
query16: Previous (3262.6 ms) vs Current (3335.4 ms) Diff -72 E2E 0.98x
query17: Previous (1844.0 ms) vs Current (1744.4 ms) Diff 99 E2E 1.06x
query18: Previous (3182.2 ms) vs Current (3178.6 ms) Diff 3 E2E 1.00x
query19: Previous (1488.0 ms) vs Current (1500.8 ms) Diff -12 E2E 0.99x
query20: Previous (591.6 ms) vs Current (629.8 ms) Diff -38 E2E 0.94x
query21: Previous (548.4 ms) vs Current (635.4 ms) Diff -87 E2E 0.86x
query22: Previous (962.0 ms) vs Current (950.6 ms) Diff 11 E2E 1.01x
query23_part1: Previous (10493.8 ms) vs Current (10460.0 ms) Diff 33 E2E 1.00x
query23_part2: Previous (14016.4 ms) vs Current (14153.8 ms) Diff -137 E2E 0.99x
query24_part1: Previous (6507.8 ms) vs Current (6394.2 ms) Diff 113 E2E 1.02x
query24_part2: Previous (6494.6 ms) vs Current (6488.6 ms) Diff 6 E2E 1.00x
query25: Previous (1638.2 ms) vs Current (1599.6 ms) Diff 38 E2E 1.02x
query26: Previous (849.2 ms) vs Current (851.0 ms) Diff -1 E2E 1.00x
query27: Previous (1346.0 ms) vs Current (1222.6 ms) Diff 123 E2E 1.10x
query28: Previous (6245.0 ms) vs Current (5135.8 ms) Diff 1109 E2E 1.22x
query29: Previous (2420.2 ms) vs Current (2526.8 ms) Diff -106 E2E 0.96x
query30: Previous (1756.0 ms) vs Current (1744.6 ms) Diff 11 E2E 1.01x
query31: Previous (2492.2 ms) vs Current (2522.8 ms) Diff -30 E2E 0.99x
query32: Previous (833.0 ms) vs Current (874.2 ms) Diff -41 E2E 0.95x
query33: Previous (1374.6 ms) vs Current (1313.0 ms) Diff 61 E2E 1.05x
query34: Previous (1649.6 ms) vs Current (1662.8 ms) Diff -13 E2E 0.99x
query35: Previous (1948.6 ms) vs Current (1984.4 ms) Diff -35 E2E 0.98x
query36: Previous (965.0 ms) vs Current (1033.8 ms) Diff -68 E2E 0.93x
query37: Previous (1196.2 ms) vs Current (1344.4 ms) Diff -148 E2E 0.89x
query38: Previous (3036.8 ms) vs Current (3376.4 ms) Diff -339 E2E 0.90x
query39_part1: Previous (2122.8 ms) vs Current (2113.8 ms) Diff 9 E2E 1.00x
query39_part2: Previous (1412.2 ms) vs Current (1311.0 ms) Diff 101 E2E 1.08x
query40: Previous (1165.6 ms) vs Current (1141.0 ms) Diff 24 E2E 1.02x
query41: Previous (326.4 ms) vs Current (306.0 ms) Diff 20 E2E 1.07x
query42: Previous (318.2 ms) vs Current (312.2 ms) Diff 6 E2E 1.02x
query43: Previous (726.8 ms) vs Current (739.4 ms) Diff -12 E2E 0.98x
query44: Previous (624.8 ms) vs Current (637.8 ms) Diff -13 E2E 0.98x
query45: Previous (1137.0 ms) vs Current (1100.4 ms) Diff 36 E2E 1.03x
query46: Previous (1616.2 ms) vs Current (1655.2 ms) Diff -39 E2E 0.98x
query47: Previous (1465.8 ms) vs Current (1507.8 ms) Diff -42 E2E 0.97x
query48: Previous (839.0 ms) vs Current (825.4 ms) Diff 13 E2E 1.02x
query49: Previous (2236.4 ms) vs Current (2186.2 ms) Diff 50 E2E 1.02x
query50: Previous (6857.4 ms) vs Current (7052.4 ms) Diff -195 E2E 0.97x
query51: Previous (2435.4 ms) vs Current (2233.2 ms) Diff 202 E2E 1.09x
query52: Previous (393.0 ms) vs Current (421.4 ms) Diff -28 E2E 0.93x
query53: Previous (732.2 ms) vs Current (562.8 ms) Diff 169 E2E 1.30x
query54: Previous (1319.2 ms) vs Current (1375.2 ms) Diff -56 E2E 0.96x
query55: Previous (363.8 ms) vs Current (368.8 ms) Diff -5 E2E 0.99x
query56: Previous (969.4 ms) vs Current (927.2 ms) Diff 42 E2E 1.05x
query57: Previous (1292.4 ms) vs Current (1290.6 ms) Diff 1 E2E 1.00x
query58: Previous (1091.0 ms) vs Current (1014.8 ms) Diff 76 E2E 1.08x
query59: Previous (1601.6 ms) vs Current (1399.4 ms) Diff 202 E2E 1.14x
query60: Previous (1569.8 ms) vs Current (1518.2 ms) Diff 51 E2E 1.03x
query61: Previous (1357.6 ms) vs Current (1431.0 ms) Diff -73 E2E 0.95x
query62: Previous (901.8 ms) vs Current (921.0 ms) Diff -19 E2E 0.98x
query63: Previous (903.8 ms) vs Current (777.2 ms) Diff 126 E2E 1.16x
query64: Previous (13170.4 ms) vs Current (13072.8 ms) Diff 97 E2E 1.01x
query65: Previous (3076.6 ms) vs Current (3032.6 ms) Diff 44 E2E 1.01x
query66: Previous (2326.6 ms) vs Current (2304.2 ms) Diff 22 E2E 1.01x
query67: Previous (18200.0 ms) vs Current (18346.8 ms) Diff -146 E2E 0.99x
query68: Previous (1091.6 ms) vs Current (1146.6 ms) Diff -55 E2E 0.95x
query69: Previous (1468.6 ms) vs Current (1471.8 ms) Diff -3 E2E 1.00x
query70: Previous (1663.4 ms) vs Current (1521.0 ms) Diff 142 E2E 1.09x
query71: Previous (2684.6 ms) vs Current (2529.8 ms) Diff 154 E2E 1.06x
query72: Previous (2697.6 ms) vs Current (2676.8 ms) Diff 20 E2E 1.01x
query73: Previous (809.2 ms) vs Current (879.2 ms) Diff -70 E2E 0.92x
query74: Previous (3871.2 ms) vs Current (3882.0 ms) Diff -10 E2E 1.00x
query75: Previous (6280.0 ms) vs Current (6336.0 ms) Diff -56 E2E 0.99x
query76: Previous (2696.4 ms) vs Current (1886.0 ms) Diff 810 E2E 1.43x
query77: Previous (780.2 ms) vs Current (773.0 ms) Diff 7 E2E 1.01x
query78: Previous (7196.6 ms) vs Current (7067.0 ms) Diff 129 E2E 1.02x
query79: Previous (1078.2 ms) vs Current (1164.4 ms) Diff -86 E2E 0.93x
query80: Previous (3464.8 ms) vs Current (3399.6 ms) Diff 65 E2E 1.02x
query81: Previous (2228.0 ms) vs Current (2266.0 ms) Diff -38 E2E 0.98x
query82: Previous (2196.4 ms) vs Current (1903.0 ms) Diff 293 E2E 1.15x
query83: Previous (651.6 ms) vs Current (654.0 ms) Diff -2 E2E 1.00x
query84: Previous (1128.2 ms) vs Current (1030.4 ms) Diff 97 E2E 1.09x
query85: Previous (1634.2 ms) vs Current (1914.6 ms) Diff -280 E2E 0.85x
query86: Previous (897.4 ms) vs Current (922.0 ms) Diff -24 E2E 0.97x
query87: Previous (3044.8 ms) vs Current (2964.0 ms) Diff 80 E2E 1.03x
query88: Previous (3685.4 ms) vs Current (3625.6 ms) Diff 59 E2E 1.02x
query89: Previous (795.6 ms) vs Current (806.6 ms) Diff -11 E2E 0.99x
query90: Previous (986.8 ms) vs Current (844.8 ms) Diff 142 E2E 1.17x
query91: Previous (1151.2 ms) vs Current (1164.8 ms) Diff -13 E2E 0.99x
query92: Previous (456.2 ms) vs Current (512.0 ms) Diff -55 E2E 0.89x
query93: Previous (9548.6 ms) vs Current (9570.0 ms) Diff -21 E2E 1.00x
query94: Previous (3812.8 ms) vs Current (3756.2 ms) Diff 56 E2E 1.02x
query95: Previous (5580.8 ms) vs Current (5419.2 ms) Diff 161 E2E 1.03x
query96: Previous (3956.8 ms) vs Current (4010.6 ms) Diff -53 E2E 0.99x
query97: Previous (1694.6 ms) vs Current (1641.8 ms) Diff 52 E2E 1.03x
query98: Previous (1284.0 ms) vs Current (1339.4 ms) Diff -55 E2E 0.96x
query99: Previous (1526.4 ms) vs Current (1535.2 ms) Diff -8 E2E 0.99x
benchmark: Previous (273600.0 ms) vs Current (271000.0 ms) Diff 2600 E2E 1.01x

@ttnghia
Copy link
Copy Markdown
Contributor

ttnghia commented Sep 18, 2024

Shorter for easy reading:

Queries of interest: q16 and q94 that use mixed_semi_join.

Speedup results
-----------------
query16: Previous (3262.6 ms) vs Current (3335.4 ms) Diff -72 E2E 0.98x
query94: Previous (3812.8 ms) vs Current (3756.2 ms) Diff 56 E2E 1.02x

@mhaseeb123
Copy link
Copy Markdown
Member

/merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants