Skip to content

perf(solve): reduce memory allocations in solve system hot paths#85

Merged
kylebeggs merged 3 commits intomainfrom
perf/reduce-allocations-solve-hotpaths
Feb 17, 2026
Merged

perf(solve): reduce memory allocations in solve system hot paths#85
kylebeggs merged 3 commits intomainfrom
perf/reduce-allocations-solve-hotpaths

Conversation

@kylebeggs
Copy link
Member

@kylebeggs kylebeggs commented Feb 17, 2026

Summary

  • forward_cache.jl: Pre-allocate A_full and b workspace outside the loop, reuse with fill!; replace [data[i] for i in neighbors] array comprehension with view(data, neighbors); replace manual lower-triangle copy with copytri!; eliminate intermediate w = λ[1:k, :] slice
  • assembly.jl: Add in-place _build_stencil!(λ, A, b, ...) variant using ldiv!/bunchkaufman! to write solution into pre-allocated buffer and return a view instead of allocating a new array + slice
  • execution.jl: Pre-allocate λ solve buffer alongside A and b in weight_kernel; explicitly reset A_full and b per stencil iteration; call new in-place _build_stencil! variant
  • interpolation.jl: Replace scalar accumulation loop with dot() for polynomial evaluation; add @inbounds to RBF accumulation loop

Closes #73

Test plan

  • Full test suite passes (julia --project=. -e "using Pkg; Pkg.test()")
  • AD extension tests pass (Enzyme + Mooncake) since _forward_with_cache output format is preserved
  • Benchmark with @benchmark update_weights!(lap) shows reduced allocations
  • Benchmark Interpolator evaluation shows reduced allocations

Benchmark results (10k points, 2D, PHS3 poly_deg=2)

update_weights! (Laplacian)

Metric main this PR Change
Allocations 358,703 300,703 -16.2%
Memory 139.37 MiB 110.48 MiB -20.7%

Interpolator evaluation (500 pts)

Metric main this PR Notes
Single-point allocs 2 2 Already minimal; dot() + @inbounds improve speed
Multi-point allocs 202 202 Already minimal

- Pre-allocate λ buffer and use in-place _build_stencil! with ldiv!/bunchkaufman!
- Use view() for local_data instead of allocating new vectors
- Replace polynomial evaluation allocation with dot() in Interpolator
- Add copytri! for efficient symmetric matrix caching in forward_cache
- Add _weight_view dispatch for Vector vs Matrix result slicing
@kylebeggs kylebeggs force-pushed the perf/reduce-allocations-solve-hotpaths branch from 5548b87 to 571e263 Compare February 17, 2026 15:11
@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/interpolation.jl 100.00% <100.00%> (ø)
src/solve/assembly.jl 97.38% <100.00%> (-2.62%) ⬇️
src/solve/execution.jl 99.03% <100.00%> (+0.03%) ⬆️
src/solve/forward_cache.jl 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 17, 2026

Benchmark Results

main 7a8035d... main / 7a8035d...
Directional 2.46 ± 0.14 ms 2.45 ± 0.14 ms 1 ± 0.082
Directional (per point) 2.4 ± 0.13 ms 2.49 ± 0.14 ms 0.964 ± 0.075
Gradient 8.33 ± 0.38 ms 8.29 ± 0.39 ms 1 ± 0.066
MonomialBasis/dim=1/deg=0 0.0356 ± 0.014 μs 0.0463 ± 0.013 μs 0.77 ± 0.37
MonomialBasis/dim=1/deg=1 0.0753 ± 0.022 μs 0.0767 ± 0.022 μs 0.981 ± 0.4
MonomialBasis/dim=1/deg=2 0.0693 ± 0.022 μs 0.0878 ± 0.024 μs 0.789 ± 0.33
MonomialBasis/dim=2/deg=0 0.0351 ± 0.013 μs 0.0345 ± 0.001 μs 1.02 ± 0.37
MonomialBasis/dim=2/deg=1 25.3 ± 13 ns 0.0353 ± 0.013 μs 0.717 ± 0.46
MonomialBasis/dim=2/deg=2 30 ± 13 ns 0.0413 ± 0.014 μs 0.726 ± 0.41
MonomialBasis/dim=3/deg=0 29.2 ± 14 ns 0.0361 ± 0.014 μs 0.81 ± 0.49
MonomialBasis/dim=3/deg=1 0.0367 ± 0.015 μs 0.0429 ± 0.014 μs 0.855 ± 0.44
MonomialBasis/dim=3/deg=2 0.038 ± 0.014 μs 0.0489 ± 0.013 μs 0.779 ± 0.36
Partial 2.43 ± 0.14 ms 2.68 ± 0.12 ms 0.909 ± 0.069
RBF/Gaussian, exp(-(ε*r)²)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 0/0/∂ 9.68 ± 0.079 ns 9.84 ± 0.11 ns 0.984 ± 0.014
RBF/Gaussian, exp(-(ε*r)²)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 0/0/∂² 10.1 ± 0.17 ns 10.2 ± 0.13 ns 0.99 ± 0.021
RBF/Gaussian, exp(-(ε*r)²)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 0/0/∇ 17.1 ± 0.12 ns 17.1 ± 0.07 ns 0.999 ± 0.0081
RBF/Gaussian, exp(-(ε*r)²)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 0/0/∇² 18.6 ± 0.04 ns 18.6 ± 0.031 ns 1 ± 0.0027
RBF/Gaussian, exp(-(ε*r)²)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 1/1/∂ 9.68 ± 0.08 ns 9.81 ± 0.09 ns 0.987 ± 0.012
RBF/Gaussian, exp(-(ε*r)²)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 1/1/∂² 10.2 ± 0.08 ns 10.1 ± 0.16 ns 1.01 ± 0.018
RBF/Gaussian, exp(-(ε*r)²)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 1/1/∇ 17.1 ± 0.15 ns 17.1 ± 0.07 ns 0.999 ± 0.0097
RBF/Gaussian, exp(-(ε*r)²)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 1/1/∇² 18.6 ± 0.04 ns 18.6 ± 0.041 ns 1 ± 0.0031
RBF/Gaussian, exp(-(ε*r)²)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 2/2/∂ 9.68 ± 0.06 ns 9.89 ± 0.1 ns 0.979 ± 0.012
RBF/Gaussian, exp(-(ε*r)²)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 2/2/∂² 10 ± 0.21 ns 10.1 ± 0.18 ns 0.995 ± 0.027
RBF/Gaussian, exp(-(ε*r)²)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 2/2/∇ 17.1 ± 0.15 ns 17.1 ± 0.069 ns 0.999 ± 0.0096
RBF/Gaussian, exp(-(ε*r)²)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 2/2/∇² 18.6 ± 0.031 ns 18.6 ± 0.061 ns 1 ± 0.0037
RBF/Inverse Multiquadrics, 1/sqrt((r*ε)²+1)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 0/0/∂ 6.32 ± 0.08 ns 6.14 ± 0.33 ns 1.03 ± 0.057
RBF/Inverse Multiquadrics, 1/sqrt((r*ε)²+1)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 0/0/∂² 14.2 ± 0.02 ns 14.2 ± 0.031 ns 1 ± 0.0026
RBF/Inverse Multiquadrics, 1/sqrt((r*ε)²+1)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 0/0/∇ 8.53 ± 0.23 ns 8.67 ± 0.24 ns 0.984 ± 0.038
RBF/Inverse Multiquadrics, 1/sqrt((r*ε)²+1)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 0/0/∇² 16 ± 0.16 ns 15.8 ± 0.11 ns 1.01 ± 0.012
RBF/Inverse Multiquadrics, 1/sqrt((r*ε)²+1)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 1/1/∂ 6.32 ± 0.01 ns 6.5 ± 0.37 ns 0.972 ± 0.055
RBF/Inverse Multiquadrics, 1/sqrt((r*ε)²+1)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 1/1/∂² 14.2 ± 0.08 ns 14.2 ± 0.071 ns 1 ± 0.0075
RBF/Inverse Multiquadrics, 1/sqrt((r*ε)²+1)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 1/1/∇ 8.54 ± 0.13 ns 8.63 ± 0.29 ns 0.99 ± 0.037
RBF/Inverse Multiquadrics, 1/sqrt((r*ε)²+1)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 1/1/∇² 16 ± 0.16 ns 16.1 ± 0.09 ns 0.991 ± 0.011
RBF/Inverse Multiquadrics, 1/sqrt((r*ε)²+1)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 2/2/∂ 6.32 ± 0.08 ns 6.14 ± 0.27 ns 1.03 ± 0.047
RBF/Inverse Multiquadrics, 1/sqrt((r*ε)²+1)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 2/2/∂² 14.2 ± 0.08 ns 14.2 ± 0.031 ns 1 ± 0.006
RBF/Inverse Multiquadrics, 1/sqrt((r*ε)²+1)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 2/2/∇ 8.6 ± 0.08 ns 8.62 ± 0.33 ns 0.997 ± 0.039
RBF/Inverse Multiquadrics, 1/sqrt((r*ε)²+1)
├─Shape factor: ε = 1
└─Polynomial augmentation: degree 2/2/∇² 16 ± 0.099 ns 15.8 ± 0.13 ns 1.01 ± 0.01
RBF/Polyharmonic spline (r³)
└─Polynomial augmentation: degree 0/0/∂ 3.42 ± 0.001 ns 3.72 ± 0.01 ns 0.919 ± 0.0025
RBF/Polyharmonic spline (r³)
└─Polynomial augmentation: degree 0/0/∂² 4.7 ± 0.01 ns 4.7 ± 0.01 ns 1 ± 0.003
RBF/Polyharmonic spline (r³)
└─Polynomial augmentation: degree 0/0/∇ 5.62 ± 0.039 ns 5.69 ± 0.011 ns 0.988 ± 0.0071
RBF/Polyharmonic spline (r³)
└─Polynomial augmentation: degree 0/0/∇² 3.11 ± 0 ns 3.11 ± 0 ns 1 ± 0
RBF/Polyharmonic spline (r³)
└─Polynomial augmentation: degree 1/1/∂ 3.42 ± 0.001 ns 3.72 ± 0.01 ns 0.919 ± 0.0025
RBF/Polyharmonic spline (r³)
└─Polynomial augmentation: degree 1/1/∂² 4.7 ± 0.01 ns 4.7 ± 0.011 ns 1 ± 0.0032
RBF/Polyharmonic spline (r³)
└─Polynomial augmentation: degree 1/1/∇ 5.61 ± 0.041 ns 5.7 ± 0.02 ns 0.984 ± 0.008
RBF/Polyharmonic spline (r³)
└─Polynomial augmentation: degree 1/1/∇² 3.11 ± 0 ns 3.11 ± 0 ns 1 ± 0
RBF/Polyharmonic spline (r³)
└─Polynomial augmentation: degree 2/2/∂ 3.42 ± 0.01 ns 3.72 ± 0.01 ns 0.919 ± 0.0037
RBF/Polyharmonic spline (r³)
└─Polynomial augmentation: degree 2/2/∂² 4.7 ± 0.01 ns 4.7 ± 0.01 ns 1 ± 0.003
RBF/Polyharmonic spline (r³)
└─Polynomial augmentation: degree 2/2/∇ 5.62 ± 0.049 ns 5.69 ± 0.02 ns 0.988 ± 0.0093
RBF/Polyharmonic spline (r³)
└─Polynomial augmentation: degree 2/2/∇² 3.11 ± 0 ns 3.11 ± 0 ns 1 ± 0
RBF/Polyharmonic spline (r¹)
└─Polynomial augmentation: degree 0/0/∂ 4.27 ± 0.01 ns 4.27 ± 0.01 ns 1 ± 0.0033
RBF/Polyharmonic spline (r¹)
└─Polynomial augmentation: degree 0/0/∂² 5.58 ± 0.08 ns 5.54 ± 0.021 ns 1.01 ± 0.015
RBF/Polyharmonic spline (r¹)
└─Polynomial augmentation: degree 0/0/∇ 8.6 ± 0.1 ns 6.85 ± 0.01 ns 1.26 ± 0.015
RBF/Polyharmonic spline (r¹)
└─Polynomial augmentation: degree 0/0/∇² 4.27 ± 0.01 ns 4.27 ± 0.01 ns 1 ± 0.0033
RBF/Polyharmonic spline (r¹)
└─Polynomial augmentation: degree 1/1/∂ 4.27 ± 0.01 ns 4.27 ± 0.01 ns 1 ± 0.0033
RBF/Polyharmonic spline (r¹)
└─Polynomial augmentation: degree 1/1/∂² 5.58 ± 0.06 ns 5.52 ± 0.03 ns 1.01 ± 0.012
RBF/Polyharmonic spline (r¹)
└─Polynomial augmentation: degree 1/1/∇ 7.04 ± 0.27 ns 6.85 ± 0.01 ns 1.03 ± 0.039
RBF/Polyharmonic spline (r¹)
└─Polynomial augmentation: degree 1/1/∇² 4.27 ± 0.01 ns 4.27 ± 0.01 ns 1 ± 0.0033
RBF/Polyharmonic spline (r¹)
└─Polynomial augmentation: degree 2/2/∂ 4.27 ± 0.01 ns 4.27 ± 0.01 ns 1 ± 0.0033
RBF/Polyharmonic spline (r¹)
└─Polynomial augmentation: degree 2/2/∂² 5.58 ± 0.08 ns 5.54 ± 0.03 ns 1.01 ± 0.015
RBF/Polyharmonic spline (r¹)
└─Polynomial augmentation: degree 2/2/∇ 6.97 ± 0.27 ns 6.85 ± 0.01 ns 1.02 ± 0.04
RBF/Polyharmonic spline (r¹)
└─Polynomial augmentation: degree 2/2/∇² 4.27 ± 0.01 ns 4.27 ± 0.01 ns 1 ± 0.0033
RBF/Polyharmonic spline (r⁵)
└─Polynomial augmentation: degree 0/0/∂ 4.96 ± 0.001 ns 5.26 ± 0.01 ns 0.943 ± 0.0018
RBF/Polyharmonic spline (r⁵)
└─Polynomial augmentation: degree 0/0/∂² 4.65 ± 0.01 ns 4.65 ± 0.01 ns 1 ± 0.003
RBF/Polyharmonic spline (r⁵)
└─Polynomial augmentation: degree 0/0/∇ 6.19 ± 0.011 ns 6.11 ± 0.079 ns 1.01 ± 0.013
RBF/Polyharmonic spline (r⁵)
└─Polynomial augmentation: degree 0/0/∇² 3.42 ± 0.001 ns 3.42 ± 0.001 ns 1 ± 0.00041
RBF/Polyharmonic spline (r⁵)
└─Polynomial augmentation: degree 1/1/∂ 4.96 ± 0.001 ns 5.26 ± 0.01 ns 0.943 ± 0.0018
RBF/Polyharmonic spline (r⁵)
└─Polynomial augmentation: degree 1/1/∂² 4.65 ± 0.01 ns 4.65 ± 0.01 ns 1 ± 0.003
RBF/Polyharmonic spline (r⁵)
└─Polynomial augmentation: degree 1/1/∇ 6.19 ± 0.011 ns 6.1 ± 0.071 ns 1.01 ± 0.012
RBF/Polyharmonic spline (r⁵)
└─Polynomial augmentation: degree 1/1/∇² 3.42 ± 0.001 ns 3.42 ± 0.001 ns 1 ± 0.00041
RBF/Polyharmonic spline (r⁵)
└─Polynomial augmentation: degree 2/2/∂ 4.96 ± 0.001 ns 5.26 ± 0.01 ns 0.943 ± 0.0018
RBF/Polyharmonic spline (r⁵)
└─Polynomial augmentation: degree 2/2/∂² 4.65 ± 0.01 ns 4.65 ± 0.01 ns 1 ± 0.003
RBF/Polyharmonic spline (r⁵)
└─Polynomial augmentation: degree 2/2/∇ 6.19 ± 0.011 ns 6.11 ± 0.07 ns 1.01 ± 0.012
RBF/Polyharmonic spline (r⁵)
└─Polynomial augmentation: degree 2/2/∇² 3.42 ± 0.01 ns 3.42 ± 0.001 ns 1 ± 0.0029
RBF/Polyharmonic spline (r⁷)
└─Polynomial augmentation: degree 0/0/∂ 10.4 ± 0.081 ns 10.3 ± 0.11 ns 1.01 ± 0.013
RBF/Polyharmonic spline (r⁷)
└─Polynomial augmentation: degree 0/0/∂² 4.96 ± 0.01 ns 4.96 ± 0.01 ns 1 ± 0.0029
RBF/Polyharmonic spline (r⁷)
└─Polynomial augmentation: degree 0/0/∇ 12.5 ± 0.071 ns 12.5 ± 0.081 ns 1 ± 0.0086
RBF/Polyharmonic spline (r⁷)
└─Polynomial augmentation: degree 0/0/∇² 8.08 ± 0.09 ns 8.05 ± 0.071 ns 1 ± 0.014
RBF/Polyharmonic spline (r⁷)
└─Polynomial augmentation: degree 1/1/∂ 10.4 ± 0.089 ns 10.3 ± 0.11 ns 1.01 ± 0.014
RBF/Polyharmonic spline (r⁷)
└─Polynomial augmentation: degree 1/1/∂² 4.96 ± 0.001 ns 4.96 ± 0.01 ns 1 ± 0.002
RBF/Polyharmonic spline (r⁷)
└─Polynomial augmentation: degree 1/1/∇ 12.5 ± 0.091 ns 12.6 ± 0.091 ns 0.995 ± 0.01
RBF/Polyharmonic spline (r⁷)
└─Polynomial augmentation: degree 1/1/∇² 8.08 ± 0.09 ns 8.06 ± 0.08 ns 1 ± 0.015
RBF/Polyharmonic spline (r⁷)
└─Polynomial augmentation: degree 2/2/∂ 10.4 ± 0.09 ns 10.2 ± 0.16 ns 1.01 ± 0.018
RBF/Polyharmonic spline (r⁷)
└─Polynomial augmentation: degree 2/2/∂² 4.96 ± 0.001 ns 4.96 ± 0.01 ns 1 ± 0.002
RBF/Polyharmonic spline (r⁷)
└─Polynomial augmentation: degree 2/2/∇ 12.5 ± 0.1 ns 12.5 ± 0.1 ns 1 ± 0.011
RBF/Polyharmonic spline (r⁷)
└─Polynomial augmentation: degree 2/2/∇² 8.08 ± 0.07 ns 8.05 ± 0.062 ns 1 ± 0.012
time_to_load 0.646 ± 0.0014 s 0.806 ± 0.0054 s 0.802 ± 0.0056

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@kylebeggs kylebeggs merged commit 0089df9 into main Feb 17, 2026
25 of 26 checks passed
@kylebeggs kylebeggs deleted the perf/reduce-allocations-solve-hotpaths branch February 20, 2026 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: optimize memory allocations in solve system hot paths

1 participant