-
Notifications
You must be signed in to change notification settings - Fork 1k
[FEA] Improve joins benchmarking #19280
Description
Is your feature request related to a problem? Please describe.
The joins benchmark measures the performance of join operations using a single integer-type column as key.
However, to better capture real-world performance, additional axes are necessary such as the data type of the keys columns, number of keys columns participating in the join operation, and the join output size.
Such finer-grained measurements will help drive heuristics on which join algorithm to pick depending on data characteristics and distributions. For instance, though the current high-multiplicity benchmark indicates that the population of the join indices is the bottleneck in sort-merge join, adding other data types such as strings, lists, dictionaries, and structs will help us understand how much the sort step impacts the overall runtime.