-
Notifications
You must be signed in to change notification settings - Fork 88
Description
@ejmeitz recently reported this weird behavior that the power function on 32-bit integers don't produce correct results:
> LEGATE_TEST=1 LEGATE_CONFIG="--gpus 1 --auto-config 0" python
Python 3.13.7 [...]
>>> import cupynumeric as np
>>> a = np.array([117] * 10, dtype="int32")
>>> b = np.array([1] * 10, dtype="int32")
>>> print(a ** b)
[116 116 116 116 116 116 116 116 116 116] # Oops
This happens only when the code goes through the GPU code path, and as it turns out, the correctness issue is due to GPU's rounding behavior: because the power function on integer values is implemented using the power function on doubles and type conversions, the float-to-integer conversions can cause correctness issues.
Unfortunately, there's no single correct rounding mode we can consistently use blindly for all values, here's a CUDA program that checks the correctness of roundtrip conversion (int32->double->int32) up to 1024. It'd report only one of two rounding modes (round-up vs. round-down) is correct for some of the values and it's not always the same one that is correct.
I can't think of a reasonable solution here other than implementing a power function on integer values that doesn't do type conversions. Here's a CuPy implementation we can copy from: https://github.com/cupy/cupy/blob/39dcf763dd339afcd49e51f4e5a0311370baa61c/cupy/_core/_routines_math.pyx#L983-L997