Refactor distinct with hashset-based algorithms#15984
Refactor distinct with hashset-based algorithms#15984rapids-bot[bot] merged 17 commits intorapidsai:branch-24.08from
Conversation
|
Please post a benchmark showing how the performance is changed. Thanks. |
|
Benchmark comparision of distinct algorithm with ['static_map.json', 'static_set.json'] distinct[0] Tesla T4
distinct_list[0] Tesla T4
Summary
|
ttnghia
left a comment
There was a problem hiding this comment.
reduce_by_row is instantiated of hash_reduce_by_row which is shared across multiple TUs. You should not separate them into multiple functions taking different sets of parameters. That is very difficult to join them again after refactor is done.
PointKernel
left a comment
There was a problem hiding this comment.
Nice work!
Good to see the build time also reduced.
|
I thought that we agreed to keep the signature of |
I'm not sure what you mean by keeping the signature of |
ttnghia
left a comment
There was a problem hiding this comment.
Approval and excited waiting for the next refactor work.
|
Further performance improvements using distinctRef Time = With
|
| Type | NumRows | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
|---|---|---|---|---|---|---|---|---|
| bool | 10000 | 169.117 us | 32.86% | 203.653 us | 1.88% | 34.536 us | 20.42% | FAIL |
| bool | 100000 | 190.318 us | 28.92% | 232.877 us | 5.05% | 42.559 us | 22.36% | FAIL |
| bool | 1000000 | 294.805 us | 16.77% | 357.525 us | 7.37% | 62.720 us | 21.28% | FAIL |
| bool | 10000000 | 1.414 ms | 6.62% | 1.676 ms | 9.63% | 261.566 us | 18.49% | FAIL |
| I8 | 10000 | 150.012 us | 2.80% | 187.305 us | 29.75% | 37.293 us | 24.86% | FAIL |
| I8 | 100000 | 161.144 us | 4.72% | 195.348 us | 5.06% | 34.204 us | 21.23% | FAIL |
| I8 | 1000000 | 283.225 us | 1.66% | 330.493 us | 8.80% | 47.268 us | 16.69% | FAIL |
| I8 | 10000000 | 1.418 ms | 2.14% | 1.671 ms | 2.87% | 252.325 us | 17.79% | FAIL |
| I32 | 10000 | 149.916 us | 2.75% | 177.923 us | 2.51% | 28.006 us | 18.68% | FAIL |
| I32 | 100000 | 165.040 us | 14.86% | 193.024 us | 2.05% | 27.985 us | 16.96% | FAIL |
| I32 | 1000000 | 283.267 us | 1.57% | 327.740 us | 1.13% | 44.473 us | 15.70% | FAIL |
| I32 | 10000000 | 1.466 ms | 2.07% | 1.721 ms | 1.94% | 255.151 us | 17.41% | FAIL |
| I64 | 10000 | 150.541 us | 2.75% | 178.182 us | 4.02% | 27.642 us | 18.36% | FAIL |
| I64 | 100000 | 158.679 us | 2.61% | 193.019 us | 2.71% | 34.340 us | 21.64% | FAIL |
| I64 | 1000000 | 287.044 us | 2.76% | 345.719 us | 16.33% | 58.675 us | 20.44% | FAIL |
| I64 | 10000000 | 1.551 ms | 1.90% | 1.830 ms | 5.21% | 278.500 us | 17.95% | FAIL |
| F32 | 10000 | 151.368 us | 5.03% | 179.430 us | 5.58% | 28.062 us | 18.54% | FAIL |
| F32 | 100000 | 169.741 us | 0.93% | 210.323 us | 4.70% | 40.582 us | 23.91% | FAIL |
| F32 | 1000000 | 531.013 us | 4.36% | 590.247 us | 10.19% | 59.235 us | 11.16% | FAIL |
| F32 | 10000000 | 7.422 ms | 0.16% | 7.640 ms | 0.29% | 217.326 us | 2.93% | FAIL |
| cudf::timestamp_ms | 10000 | 150.855 us | 7.18% | 179.409 us | 4.51% | 28.554 us | 18.93% | FAIL |
| cudf::timestamp_ms | 100000 | 158.212 us | 2.58% | 193.409 us | 2.16% | 35.198 us | 22.25% | FAIL |
| cudf::timestamp_ms | 1000000 | 289.823 us | 2.29% | 336.280 us | 1.06% | 46.457 us | 16.03% | FAIL |
| cudf::timestamp_ms | 10000000 | 1.581 ms | 1.64% | 1.823 ms | 1.33% | 242.283 us | 15.32% | FAIL |
distinct_list
[0] Tesla T4
| Type | null_probability | ColumnSize | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
|---|---|---|---|---|---|---|---|---|---|
| I32 | 0 | 100000000 | 3.183 ms | 1.17% | 3.666 ms | 0.48% | 483.897 us | 15.20% | FAIL |
| I32 | 0.1 | 100000000 | 3.632 ms | 1.40% | 4.199 ms | 0.99% | 567.874 us | 15.64% | FAIL |
| cudf::list_view | 0 | 100000000 | 3.512 ms | 0.88% | 3.584 ms | 0.68% | 72.714 us | 2.07% | FAIL |
| cudf::list_view | 0.1 | 100000000 | 4.124 ms | 0.91% | 4.206 ms | 1.78% | 81.445 us | 1.97% | FAIL |
Summary
- Total Matches: 28
- Pass (diff <= min_noise): 0
- Unknown (infinite noise): 0
- Failure (diff > min_noise): 28
PointKernel
left a comment
There was a problem hiding this comment.
Excellent! 🔥 🔥 🔥
Thank you!
|
/merge |
Description
Refactor distinct algorithm to use
cuco::static_set.Checklist