Fix `@fastmath x^2` inlining regression for `Float32` and `Float16` by Yashagarwal9798 · Pull Request #60640 · JuliaLang/julia

Yashagarwal9798 · 2026-01-11T23:52:25Z

Summary

This PR fixes the performance regression where @fastmath x^2 for Float32 was not being inlined to efficient LLVM code, unlike Float64.

Problem

As reported in #60639, @fastmath x^2 for Float32 was falling back to power_by_squaring instead of using the LLVM powi intrinsic. This resulted in:

Unnecessary function calls instead of inline multiplication
Potential type promotion to Float64
Suboptimal generated code compared to Float64

Before this fix, @code_llvm @fastmath Float32(1.5)^2 would show calls to power_by_squaring, while Float64 correctly used the llvm.powi intrinsic.

Solution

Added the missing pow_fast methods for Float32 and Float16:

pow_fast(::Float32, ::Int32) - uses llvm.powi.f32.i32 intrinsic directly
pow_fast(::Float32, ::Integer) - wrapper that converts to Int32 when safe, matching the Float64 pattern
pow_fast(::Float16, ::Integer) - converts to Float32, computes, and converts back

This mirrors the existing implementation for Float64 which already used llvm.powi.f64.i32.

Testing

Added a regression test that verifies @fastmath x^2 generates inline fmul instructions (not power_by_squaring calls) for Float16, Float32, and Float64.

Fixes #60639

base/fastmath.jl

test/fastmath.jl

adienes · 2026-01-12T01:10:13Z

@Yashagarwal9798 please be aware that all uses of AI must be disclosed

base/fastmath.jl

DilumAluthge · 2026-01-12T20:39:35Z

Build failure looks real?

error during bootstrap:
LoadError("sysimg.jl", 3, LoadError("Base.jl", 222, LoadError("fastmath.jl", 300, UndefVarError(:IEEEFloat, 0x0000000000005e69, Base.FastMath))))

base/fastmath.jl

DilumAluthge · 2026-01-14T14:05:28Z

@oscardssmith CI is all green now.

oscardssmith · 2026-01-14T14:08:15Z

@Yashagarwal9798 thanks for the PR!

Yashagarwal9798 · 2026-01-14T15:33:34Z

Thanks! really glad I could help improve the project.

eschnett · 2026-01-14T15:56:27Z

This is a candidate for backporting. Although this concerns only an optimization, generating inefficient code for x^2 is a disaster performance-wise.

oscardssmith · 2026-01-14T16:02:21Z

Agreed. If we backport, we need to be careful not to back-port the Float16 version of this to 1.12 (LLVM only recently doesn't produce garbage on x86 with this intrinsic). 1.13 should receive the back-port unmodified though. I'll put up the 1.12 PR.

@fastmath

## Summary This PR fixes the performance regression where `@fastmath x^2` for `Float32` was not being inlined to efficient LLVM code, unlike `Float64`. ## Problem As reported in #60639, `@fastmath x^2` for `Float32` was falling back to `power_by_squaring` instead of using the LLVM `powi` intrinsic. This resulted in: - Unnecessary function calls instead of inline multiplication - Potential type promotion to `Float64` - Suboptimal generated code compared to `Float64` Before this fix, `@code_llvm @fastmath Float32(1.5)^2` would show calls to `power_by_squaring`, while `Float64` correctly used the `llvm.powi` intrinsic. ## Solution Added the missing `pow_fast` methods for `Float32` and `Float16`: - `pow_fast(::Float32, ::Int32)` - uses `llvm.powi.f32.i32` intrinsic directly - `pow_fast(::Float32, ::Integer)` - wrapper that converts to `Int32` when safe, matching the `Float64` pattern - `pow_fast(::Float16, ::Integer)` - converts to `Float32`, computes, and converts back This mirrors the existing implementation for `Float64` which already used `llvm.powi.f64.i32`. ## Testing Added a regression test that verifies `@fastmath x^2` generates inline `fmul` instructions (not `power_by_squaring` calls) for `Float16`, `Float32`, and `Float64`. Fixes #60639 --------- Co-authored-by: Oscar Smith <oscardssmith@gmail.com> (cherry picked from commit f34d5f2)

vchuravy · 2026-03-25T12:47:55Z

@oscardssmith did you open a PR to backport this to 1.12?

oscardssmith · 2026-03-25T13:14:04Z

I did, but I linked to https://github.com/JuliaLang/julia/pull/60640/changes so it didn't tag.

github-actions bot assigned LilithHafner Jan 11, 2026

oscardssmith reviewed Jan 12, 2026

View reviewed changes

base/fastmath.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Jan 12, 2026

View reviewed changes

base/fastmath.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Jan 12, 2026

View reviewed changes

test/fastmath.jl Show resolved Hide resolved

Yashagarwal9798 force-pushed the fix-pow-fast-float32 branch from 0d4ccb4 to c79d1c7 Compare January 12, 2026 13:18

oscardssmith approved these changes Jan 12, 2026

View reviewed changes

oscardssmith added performance Must go faster maths Mathematical functions merge me PR is reviewed. Merge when all tests are passing labels Jan 12, 2026

giordano reviewed Jan 12, 2026

View reviewed changes

base/fastmath.jl Outdated Show resolved Hide resolved

Yashagarwal9798 force-pushed the fix-pow-fast-float32 branch from c79d1c7 to 102b4a1 Compare January 12, 2026 15:33

LilithHafner added the ai label Jan 12, 2026

oscardssmith reviewed Jan 12, 2026

View reviewed changes

base/fastmath.jl Outdated Show resolved Hide resolved

Yashagarwal9798 force-pushed the fix-pow-fast-float32 branch from 102b4a1 to 1664287 Compare January 12, 2026 16:17

DilumAluthge added failing CI is failing. Needs attention. No need to re-run CI. and removed merge me PR is reviewed. Merge when all tests are passing labels Jan 12, 2026

oscardssmith reviewed Jan 12, 2026

View reviewed changes

base/fastmath.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Jan 12, 2026

View reviewed changes

base/fastmath.jl Outdated Show resolved Hide resolved

Yashagarwal9798 and others added 2 commits January 13, 2026 17:56

fastmath:

6a90b4b

Base.IEEEFloat

bd9c0e9

Yashagarwal9798 force-pushed the fix-pow-fast-float32 branch from 6a69d79 to bd9c0e9 Compare January 13, 2026 12:27

LilithHafner removed the ai label Jan 13, 2026

oscardssmith reviewed Jan 14, 2026

View reviewed changes

base/fastmath.jl Outdated Show resolved Hide resolved

fix ccall

1da88f1

oscardssmith merged commit f34d5f2 into JuliaLang:master Jan 14, 2026
8 checks passed

oscardssmith added the backport 1.13 Change should be backported to release-1.13 label Jan 14, 2026

oscardssmith changed the title ~~Fix @fastmath x^2 inlining regression for Float32 and Float16~~ Fix @fastmath x^2 inlining regression for Float32 and Float16 Jan 14, 2026

DilumAluthge mentioned this pull request Jan 14, 2026

Improve error message for credential failures in non-interactive environments #60682

Open

KristofferC mentioned this pull request Jan 26, 2026

Backports for 1.13.0-beta2 #60614

Merged

43 tasks

KristofferC removed the backport 1.13 Change should be backported to release-1.13 label Feb 4, 2026

oscardssmith mentioned this pull request Mar 25, 2026

Backport @fastmath x^2 inlining regression to 1.12 #60686

Merged

Uh oh!

Conversation

Yashagarwal9798 commented Jan 11, 2026

Summary

Problem

Solution

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adienes commented Jan 12, 2026

Uh oh!

Uh oh!

Uh oh!

DilumAluthge commented Jan 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DilumAluthge commented Jan 14, 2026

Uh oh!

Uh oh!

oscardssmith commented Jan 14, 2026

Uh oh!

Yashagarwal9798 commented Jan 14, 2026

Uh oh!

eschnett commented Jan 14, 2026

Uh oh!

oscardssmith commented Jan 14, 2026

Uh oh!

vchuravy commented Mar 25, 2026

Uh oh!

oscardssmith commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants