Skip to content

[FEA] Improve occupancy during hash table build #15502

@tgujar

Description

@tgujar

Is your feature request related to a problem? Please describe.
cuco insert kernel has poor occupancy due to high register usage during hash table build operation executed by cuDF. If I disable some of the code paths for complex types(commenting out dict, string, list, struct, decimal) in

CUDF_HOST_DEVICE __forceinline__ constexpr decltype(auto) type_dispatcher(cudf::data_type dtype,
the type dispatcher, then the register usage per thread drops from 75 -> 46 and leads to a significant occupancy bump. It seems that the insert kernel has to pay the cost of high register usage even for simpler types since the compiler has to account for all code paths.

I did some experiments by disabling different subsets of types, list has types I disable -> register count for insert kernel

  • decimal -> 72
  • struct -> 73
  • list -> 73
  • string -> 73
  • dict -> 68
  • struct, list -> 64
  • list, decimal, struct -> 63
  • dict, string, list, struct -> 58
  • string, dict, struct, list, decimal -> 46

Here is the speedup I see on mixed semi join kernel by improving occupancy for int32 keys obtained by disabling complex types
image

Describe the solution you'd like
Improve occupancy by disabling codepaths for complex types.

Describe alternatives you've considered

  1. Add more template params to the hasher/comparator which allow us to separate codepaths for complex types and simpler types, or
  2. Add JIT compilation to only consider the types necessary for hasher/comparator for a row

Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformancePerformance related issuefeature requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions