Conversation
The x86 psABI specifies that f16 goes on the stack (but is returned in xmm0). Our emulation for compilers that do not support _Float16 did not handle this properly. Fixes #61072.
|
Okay, CI passed on Windows, but neither of the Windows jobs ran on |
|
I suspect it's dependent on it because it changes if the cpu has native support or not |
|
I suppose someone with access to |
$ julia +nightly~x86 --cpu-target=pentium4 -E 'Float16(3.0) * 2'
Float16(4.0)
$ julia +nightly~x86 -E 'Float16(3.0) * 2'
Float16(6.0)
$ ./julia-46c8113b1a/bin/julia --cpu-target=pentium4 -E 'Float16(3.0) * 2'
Float16(6.0)
$ ./julia-46c8113b1a/bin/julia -E 'Float16(3.0) * 2'
Float16(6.0)
$ julia +nightly~x86 -e 'using InteractiveUtils; versioninfo()'
Julia Version 1.14.0-DEV.1761
Commit 9aff288d6de (2026-02-20 13:53 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (i686-linux-gnu)
CPU: 224 × Intel(R) Xeon(R) CPU Max 9480
WORD_SIZE: 32
LLVM: libLLVM-20.1.8 (ORCJIT, sapphirerapids)
GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 224 virtual cores)
$ ./julia-46c8113b1a/bin/julia -e 'using InteractiveUtils; versioninfo()'
Julia Version 1.14.0-DEV.1760
Commit 46c8113b1a4 (2026-02-20 07:23 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (i686-linux-gnu)
CPU: 224 × Intel(R) Xeon(R) CPU Max 9480
WORD_SIZE: 32
LLVM: libLLVM-20.1.8 (ORCJIT, sapphirerapids)
GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 224 virtual cores)Seems to work for me (not on |
I don't understand this. On demeter (no AVX512FP16): So the bug at least is not dependent on having support. |
|
What's the microarchitecture of demeter? I couldn't reproduce on a bunch of machines I tried (including znver4, which has avx512, but not avx512-fp16), only on sapphirerapids. |
|
This needs backporting to all active release branches, I assume? |
|
|
Ah, checking the history of the znver4 machine I tested this on, I had tried |
The x86 psABI specifies that f16 goes on the stack (but is returned in xmm0). Our emulation for compilers that do not support _Float16 did not handle this properly. Fixes #61072. I think this fixes it at least - I'm too tired to actually go check at this point, so I'll just let CI try it ;). My remaining concern is that I don't quite understand why seeing this behavior is AVX512 dependent. (cherry picked from commit 2ba9b37)
The x86 psABI specifies that f16 goes on the stack (but is returned in xmm0). Our emulation for compilers that do not support _Float16 did not handle this properly. Fixes #61072.
I think this fixes it at least - I'm too tired to actually go check at this point, so I'll just let CI try it ;). My remaining concern is that I don't quite understand why seeing this behavior is AVX512 dependent.