-
Notifications
You must be signed in to change notification settings - Fork 1k
[FEA] Add prefetching to join build object's internal map/set storage #20073
Description
Is your feature request related to a problem? Please describe.
When evaluating Velox-cuDF's performance on larger than memory joins, we observed significant page faults in inner_join. This was fairly mitigated when prefetching the hash_join object's internal map/set storage.
Describe the solution you'd like
cuDF to prefetch the storage used internally by cuco just before a join, by either calling prefetch on cuco's internal storage using its public APIs like so:
--- a/cpp/src/join/hash_join.cu
+++ b/cpp/src/join/hash_join.cu
@@ -278,6 +278,8 @@ probe_join_hash_table(
auto right_indices = std::make_unique<rmm::device_uvector<size_type>>(join_size, stream, mr);
cudf::prefetch::detail::prefetch(*left_indices, stream);
cudf::prefetch::detail::prefetch(*right_indices, stream);
+ cudf::experimental::prefetch::detail::prefetch(
+ hash_table.data(), hash_table.capacity() * sizeof(*hash_table.data()), stream);
auto const probe_table_num_rows = probe_table.num_rows();
auto const out_probe_begin =Or by libcudf owning the map/set's underlying storage directly and using cuco's non-owning types in conjunction with it so that it can confidently prefetch it.
Additionally, prefetching the build table would also help in further reducing the page faults. Although that could be done in the application layer too as the hash_join object does not take ownership of it, it would be convenient if cuDF did that automatically.

