Retire AtomData range/labels caches for eager computation#70
Merged
Conversation
- Expose ranges/labels via read-only @Property backed by a stored MappingProxyType, preventing attribute rebinding. - Return labels as tuple[str, ...] so the contained sequence is immutable. - Guard _compute_global_range against empty 2-D arrays to avoid a latent ValueError when n_atoms or n_frames is zero. - Compute derived metadata into locals before mutating state in __setitem__ so a raising helper cannot leave the invariant broken. - Tests: add empty-2D guard, attribute rebinding refusal, proxy identity stability, live view from captured reference, tighter TypeError match on the read-only checks.
Restrict _compute_global_range to bool, integer, and floating-point kinds. Complex, datetime, bytes, and other dtypes that np.nanmin or float() cannot handle are now stored successfully with ranges[key] = None, matching the behaviour for 1-D and categorical arrays. Previously these would raise at __setitem__ because the eager compute tried to reduce them.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Replace the permissive handling of non-numeric dtypes with explicit validation in AtomData.__setitem__. Only bool, integer, float, string, and object dtypes are supported; complex, datetime, bytes, and other dtypes now raise ValueError at assignment with a clear message, rather than quietly storing an array that would later fail in the rendering pipeline. _compute_global_range is restored to its simpler form that trusts its caller on dtype.
- _compute_global_labels docstring now pins the first-encountered insertion order as load-bearing for colourmap assignment in _resolve_categorical, warning future maintainers against sorting. - test_ranges_empty_2d_returns_none now uses _make_atom_data instead of a local frames list with a type: ignore. - Drop match="mappingproxy" from the read-only tests; the TypeError on write is the meaningful assertion and match pinned a CPython internal type name.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
resolve_atom_colours is re-exported as a public API and accepts a
raw dict[str, np.ndarray], so the AtomData.__setitem__ dtype check
does not cover direct callers who bypass the AtomData container.
Without validation at the colouring boundary, a complex array
silently drops its imaginary part via values.astype(float) in
_resolve_numerical; datetime arrays cast to an epoch-day gradient;
bytes arrays fail with an opaque numpy error from deep in the cast.
Add the same dtype whitelist at the top of _resolve_single_layer,
right after looking up the array. Matches the AtomData boundary
rule and names the key in the error.
Also:
- Document the ranges/labels partition invariant in the class
docstring: 2-D arrays populate exactly one attribute, 1-D arrays
populate neither.
- Parametrise the __setitem__ dtype-rejection tests over the full
{complex, datetime, timedelta, bytes, void} rejection set, using
uniform 1-D shapes.
- Add a test pinning the new validation at resolve_atom_colours.
The public resolve_atom_colours API was exposed early in the project's life, before StructureScene matured into a proper public surface for colour resolution. It never developed a documented workflow: no internal non-rendering caller, no complementary scene.atom_colours() method, no worked example in the docs. It reads as an accidental export that survived because nobody revisited it. Rename to _resolve_atom_colours and remove from the public API; colour resolution goes through the scene rendering methods. Knock-on simplifications: - Drop the duplicate dtype whitelist at the top of _resolve_single_layer. It was added to protect a public-API boundary that is no longer there; AtomData.__setitem__ still validates the realistic path. The rule lives in one place. - Drop test_unsupported_dtype_raises from test_colour.py (paired with the above). - Drop test_ranges_identity_stable_across_accesses from test_atom_data.py. It pinned the stored-proxy implementation choice rather than a user-facing contract, and a future switch to per-access proxies (equivalent user behaviour) would fail the test without meaning anything. Changelog entry added under 0.18.0.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add a precondition check at _resolve_numerical's unsafe values.astype(float) site. This is not a duplicate of AtomData.__setitem__'s rule: AtomData enforces what can be stored, _resolve_numerical enforces its own local requirement at the point of the cast. A future refactor that breaks the four-file invariant chain (AtomData validates -> precompute indexes -> _resolve_atom_colours dispatches -> _resolve_single_layer routes) would otherwise silently re-open the complex -> drop-imaginary- part bug. Four lines makes the latent defect non-latent. - Test: _resolve_atom_colours raises ValueError for a complex array, exercising the new precondition through the dispatch chain. - Fix stale :func:\`~hofmann.resolve_atom_colours\` Sphinx cross-reference in the historical 0.2.x changelog entry; the target no longer exists after the privatisation. Downgraded to a plain literal so the historical record is preserved without a broken reference warning. - Correct the AtomData class docstring partition-invariant paragraph. The previous wording overclaimed "exactly one of ranges[key] and labels[key] holds a value" for 2-D arrays, which is false for three legitimate cases already documented in the Attributes: block (empty numeric, all-NaN numeric, all-missing categorical). Rewritten to match the per-attribute descriptions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three user-visible changes in 0.18.0, all in the
AtomData/ colour area:AtomDatanow computesglobal_range/global_labelseagerly in__setitem__rather than lazily with cache invalidation. Safe because Freeze AtomData entries on insertion #64 already freezes stored arrays on assignment, so a value computed at set time stays valid for the lifetime of the stored array. Theglobal_range(key)andglobal_labels(key)methods are replaced by read-onlyad.rangesandad.labelsmapping attributes, backed byMappingProxyTypeand typed ascollections.abc.Mapping. Callers migrate with a direct substitution:ad.global_range(key)becomesad.ranges[key], andad.global_labels(key)becomesad.labels[key]. Closes Retire AtomData range/labels caches in favour of eager computation #68.AtomData.__setitem__now rejects unsupported dtypes at assignment time with a clear error. Supported dtypes are bool, integer, float, string, and object; complex, datetime, bytes, and other dtypes now raiseValueErrorat assignment rather than failing later in the rendering pipeline.resolve_atom_coloursis no longer part of the public API. Colour resolution goes through theStructureScenerendering methods.Breaking API changes, version bumped to 0.18.0.