Skip to content

Conversation

@zatkins2
Copy link
Contributor

Currently, if an alm is single-precision, then alm2cl does all summation and accumulation at single precision. For reasonable data and lmax, this should be OK, but can begin to fail due to floating-point errors for edge cases (see slack thread here for more gory detail). I can reproduce the issue for a particular edge-case:

import numpy as np
import matplotlib.pyplot as plt
from pixell import enmap, curvedsky

lmax = 300_000
a = np.zeros(int(lmax*(lmax+1)/2), dtype=np.complex64)
a.real[:] = 1.2345
c = curvedsky.alm2cl(a) # Using only 1 thread
plt.plot(c / 1.2345**2)

which gives:
image
In other words, for large lmax and using only 1 thread, the number of terms to be averaged can begin to accumulate O(0.5%)-level errors. In reality, lmax is much lower, and when we use N threads, the number of terms in each sum is further reduced by N, but this is the kind of thing I would be paranoid about getting wrong if there are weird things in the data that are hard to mock-up in a test.

A brute-force workaround is to cast the alm to double-precision before doing alm2cl, but this has large computation disadvantages. Since the data is actually single-precision, it's unnecessary, and doing so:

  1. for every map in a job at map2alm-time will double the memory requirements of holding those maps in general
  2. for every calculated spectrum at alm2cl-time can avoid the memory cost, but adds substantial time from the casting itself (this time can grow very large for a simulation ensemble, up to O(100) node-hours spent casting alone)

Fortunately, there is a near-zero-cost, simple workaround that eliminates this possibility: accumulate the terms at double-precision (I could have also tried Kahan summation, but this is really simpler). In that case, the alm remains in single-precision, while the math occurs at double precision by changing the type of the buffer here. This relies on the type-promotion to double happening in C which is way faster than copying whole arrays in numpy.

In this implementation, one passes double_accum=True to alm2cl. The above floating-point error is eliminated:
image

The runtime penalty is between 5 and 10% per alm2cl, which is negligible in most scripts where other computation is way more lengthy. For a typical single-precision alm, the change in the result from current code to this is, as expected, at single-point precision:
image

@zatkins2 zatkins2 requested review from amaurea and msyriac December 15, 2025 22:03
@codecov
Copy link

codecov bot commented Dec 15, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.60%. Comparing base (73d3f2b) to head (895bc5d).

Files with missing lines Patch % Lines
pixell/curvedsky.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #324   +/-   ##
=======================================
  Coverage   37.60%   37.60%           
=======================================
  Files          37       37           
  Lines       11540    11540           
=======================================
  Hits         4340     4340           
  Misses       7200     7200           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants