Skip to content

[FEA] Improve joins benchmarking #19280

@shrshi

Description

@shrshi

Is your feature request related to a problem? Please describe.

The joins benchmark measures the performance of join operations using a single integer-type column as key.
However, to better capture real-world performance, additional axes are necessary such as the data type of the keys columns, number of keys columns participating in the join operation, and the join output size.
Such finer-grained measurements will help drive heuristics on which join algorithm to pick depending on data characteristics and distributions. For instance, though the current high-multiplicity benchmark indicates that the population of the join indices is the bottleneck in sort-merge join, adding other data types such as strings, lists, dictionaries, and structs will help us understand how much the sort step impacts the overall runtime.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions