Skip to content

perf: optimize memory allocations in solve system hot paths #73

@kylebeggs

Description

@kylebeggs

Summary

Performance profiling of the solve system identified several hot paths with unnecessary memory allocations that could be optimized.

Optimization Opportunities

1. src/solve/forward_cache.jl - Per-Stencil Allocations (Lines 54-64)

local_data = [data[i] for i in neighbors]  # Allocates vector each iteration
A_full = zeros(TD, n, n)                   # Allocates fresh matrix
b = zeros(TD, n, num_ops)                  # Allocates fresh RHS

Fix: Pre-allocate buffers outside the evaluation loop and reuse with fill!.

2. src/solve/execution.jl - Kernel Matrix Allocation (Lines 235-238)

@kernel function weight_kernel(...)
    for eval_idx in start_idx:end_idx
        n = k + nmon
        A = Symmetric(zeros(TD, n, n), :U)  # Allocates per eval point!
        b = _prepare_buffer(ℒrbf, TD, n)

Fix: Hoist allocations outside the loop; reuse buffers across evaluation points.

3. src/solve/forward_cache.jl - Dense Matrix Copies (Lines 81-87)

# Explicitly filling lower triangle
for j in 1:n
    for i in (j + 1):n
        A_full_symmetric[i, j] = A_full[j, i]  # Redundant O(n²) copy
    end
end
stencil_caches[eval_idx] = StencilForwardCache(copy(λ), A_full_symmetric, k, nmon)

Fix: Use Symmetric(A_full, :U) view instead of explicit copy. Consider if full matrix storage is necessary.

4. src/interpolation.jl - Scalar Loop Instead of BLAS (Lines 36-50)

for i in eachindex(rbfi.rbf_weights)
    rbf += rbfi.rbf_weights[i] * rbfi.rbf_basis(x, rbfi.x[i])  # Scalar accumulation
end

Fix: Pre-compute basis evaluations into a vector, then use dot(rbfi.rbf_weights, basis_vals) for BLAS acceleration.

Expected Impact

  • Reduced GC pressure during weight computation
  • Better cache locality from buffer reuse
  • Potential 2-5x speedup for large stencil sizes (k > 30)

Related

This follows the BLAS optimizations added in commit 4c64025 for the backward pass.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions