Skip to content

Retire AtomData range/labels caches for eager computation#70

Merged
bjmorgan merged 12 commits intomainfrom
retire-atom-data-caches
Apr 11, 2026
Merged

Retire AtomData range/labels caches for eager computation#70
bjmorgan merged 12 commits intomainfrom
retire-atom-data-caches

Conversation

@bjmorgan
Copy link
Copy Markdown
Owner

@bjmorgan bjmorgan commented Apr 11, 2026

Summary

Three user-visible changes in 0.18.0, all in the AtomData / colour area:

  • AtomData now computes global_range / global_labels eagerly in __setitem__ rather than lazily with cache invalidation. Safe because Freeze AtomData entries on insertion #64 already freezes stored arrays on assignment, so a value computed at set time stays valid for the lifetime of the stored array. The global_range(key) and global_labels(key) methods are replaced by read-only ad.ranges and ad.labels mapping attributes, backed by MappingProxyType and typed as collections.abc.Mapping. Callers migrate with a direct substitution: ad.global_range(key) becomes ad.ranges[key], and ad.global_labels(key) becomes ad.labels[key]. Closes Retire AtomData range/labels caches in favour of eager computation #68.

  • AtomData.__setitem__ now rejects unsupported dtypes at assignment time with a clear error. Supported dtypes are bool, integer, float, string, and object; complex, datetime, bytes, and other dtypes now raise ValueError at assignment rather than failing later in the rendering pipeline.

  • resolve_atom_colours is no longer part of the public API. Colour resolution goes through the StructureScene rendering methods.

Breaking API changes, version bumped to 0.18.0.

Copilot AI review requested due to automatic review settings April 11, 2026 06:40

This comment was marked as resolved.

- Expose ranges/labels via read-only @Property backed by a stored
  MappingProxyType, preventing attribute rebinding.
- Return labels as tuple[str, ...] so the contained sequence is
  immutable.
- Guard _compute_global_range against empty 2-D arrays to avoid a
  latent ValueError when n_atoms or n_frames is zero.
- Compute derived metadata into locals before mutating state in
  __setitem__ so a raising helper cannot leave the invariant broken.
- Tests: add empty-2D guard, attribute rebinding refusal, proxy
  identity stability, live view from captured reference, tighter
  TypeError match on the read-only checks.
Restrict _compute_global_range to bool, integer, and floating-point
kinds. Complex, datetime, bytes, and other dtypes that np.nanmin or
float() cannot handle are now stored successfully with
ranges[key] = None, matching the behaviour for 1-D and categorical
arrays. Previously these would raise at __setitem__ because the
eager compute tried to reduce them.
Copilot AI review requested due to automatic review settings April 11, 2026 06:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace the permissive handling of non-numeric dtypes with explicit
validation in AtomData.__setitem__. Only bool, integer, float,
string, and object dtypes are supported; complex, datetime, bytes,
and other dtypes now raise ValueError at assignment with a clear
message, rather than quietly storing an array that would later fail
in the rendering pipeline. _compute_global_range is restored to its
simpler form that trusts its caller on dtype.
- _compute_global_labels docstring now pins the first-encountered
  insertion order as load-bearing for colourmap assignment in
  _resolve_categorical, warning future maintainers against sorting.
- test_ranges_empty_2d_returns_none now uses _make_atom_data instead
  of a local frames list with a type: ignore.
- Drop match="mappingproxy" from the read-only tests; the TypeError
  on write is the meaningful assertion and match pinned a CPython
  internal type name.
Copilot AI review requested due to automatic review settings April 11, 2026 07:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

resolve_atom_colours is re-exported as a public API and accepts a
raw dict[str, np.ndarray], so the AtomData.__setitem__ dtype check
does not cover direct callers who bypass the AtomData container.
Without validation at the colouring boundary, a complex array
silently drops its imaginary part via values.astype(float) in
_resolve_numerical; datetime arrays cast to an epoch-day gradient;
bytes arrays fail with an opaque numpy error from deep in the cast.

Add the same dtype whitelist at the top of _resolve_single_layer,
right after looking up the array. Matches the AtomData boundary
rule and names the key in the error.

Also:
- Document the ranges/labels partition invariant in the class
  docstring: 2-D arrays populate exactly one attribute, 1-D arrays
  populate neither.
- Parametrise the __setitem__ dtype-rejection tests over the full
  {complex, datetime, timedelta, bytes, void} rejection set, using
  uniform 1-D shapes.
- Add a test pinning the new validation at resolve_atom_colours.
The public resolve_atom_colours API was exposed early in the
project's life, before StructureScene matured into a proper public
surface for colour resolution. It never developed a documented
workflow: no internal non-rendering caller, no complementary
scene.atom_colours() method, no worked example in the docs. It
reads as an accidental export that survived because nobody revisited
it. Rename to _resolve_atom_colours and remove from the public API;
colour resolution goes through the scene rendering methods.

Knock-on simplifications:

- Drop the duplicate dtype whitelist at the top of
  _resolve_single_layer. It was added to protect a public-API
  boundary that is no longer there; AtomData.__setitem__ still
  validates the realistic path. The rule lives in one place.

- Drop test_unsupported_dtype_raises from test_colour.py (paired
  with the above).

- Drop test_ranges_identity_stable_across_accesses from
  test_atom_data.py. It pinned the stored-proxy implementation
  choice rather than a user-facing contract, and a future switch to
  per-access proxies (equivalent user behaviour) would fail the
  test without meaning anything.

Changelog entry added under 0.18.0.
Copilot AI review requested due to automatic review settings April 11, 2026 08:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Add a precondition check at _resolve_numerical's unsafe
  values.astype(float) site. This is not a duplicate of
  AtomData.__setitem__'s rule: AtomData enforces what can be stored,
  _resolve_numerical enforces its own local requirement at the
  point of the cast. A future refactor that breaks the four-file
  invariant chain (AtomData validates -> precompute indexes ->
  _resolve_atom_colours dispatches -> _resolve_single_layer routes)
  would otherwise silently re-open the complex -> drop-imaginary-
  part bug. Four lines makes the latent defect non-latent.

- Test: _resolve_atom_colours raises ValueError for a complex
  array, exercising the new precondition through the dispatch
  chain.

- Fix stale :func:\`~hofmann.resolve_atom_colours\` Sphinx
  cross-reference in the historical 0.2.x changelog entry; the
  target no longer exists after the privatisation. Downgraded to
  a plain literal so the historical record is preserved without a
  broken reference warning.

- Correct the AtomData class docstring partition-invariant
  paragraph. The previous wording overclaimed "exactly one of
  ranges[key] and labels[key] holds a value" for 2-D arrays, which
  is false for three legitimate cases already documented in the
  Attributes: block (empty numeric, all-NaN numeric, all-missing
  categorical). Rewritten to match the per-attribute descriptions.
@bjmorgan bjmorgan merged commit bb79452 into main Apr 11, 2026
5 checks passed
@bjmorgan bjmorgan deleted the retire-atom-data-caches branch April 11, 2026 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Retire AtomData range/labels caches in favour of eager computation

2 participants