Skip to content

[FEA] Support for NVIDIA_TF32_OVERRIDE environment variable + handle #1393

@ahendriksen

Description

@ahendriksen

Is your feature request related to a problem? Please describe.
I have recently run some brute force KNN benchmarks with @tfeher. Here, we looked at the impact of using 1 x tf32 versus 3 x tf32 performance of brute force knn. On a representative benchmark, using 1 x tf32 resulted in a 2.5x speedup (5 seconds -> 2 seconds). This can be significant for certain workloads (but can also not be set as the default due to unknown effects of reduced numerical accuracy).

We ran into the problem how to support this use case in our current pairwise distance API. We already have two distance types for the L2 distance (expanded and unexpanded). Adding variants for every possible way of speeding up the computation could become prohibitive. CuBLAS supports the NVIDIA_TF32_OVERRIDE environment variable that can force fp32 computations to be performed in tfloat32 precision.

Describe the solution you'd like
Add support for the NVIDIA_TF32_OVERRIDE environment in the RAFT handle. This way, algorithms can interrogate this option without having to continously inspect the environment.

In addition, make it possible to set the tf32 override programmatically. For instance, PyTorch supports the following:

# The flag below controls whether to allow TF32 on matmul. 
torch.backends.cuda.matmul.allow_tf32 = True

Describe alternatives you've considered
Adding another L2 distance type, which I think is unwise (and would not help in the case of cosine distance). Also, adding boolean flags to the pairwise distance API is going to be a mess.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions