Summary
StructureScene had a set_atom_data(key, values) method until commit 363d493, including a sparse form that accepted dict[int, value] and filled missing atoms with NaN (numeric) or "" (string). The sparse form was dropped when AtomData was introduced; users now have to build full-length arrays by hand, including NaN-filling atoms they do not care about.
Reintroduce set_atom_data() as a method on StructureScene, with sparse forms keyed by species label and/or atom index.
Proposed API
scene.set_atom_data(
key,
values=None,
*,
by_species=None, # dict[str, scalar | 1-D | 2-D array]
by_index=None, # dict[int, scalar | 1-D array]
)
values (positional): a full-length array-like. No longer accepts dict[int, value] -- use by_index= instead.
by_species: maps species labels to values. Scalars broadcast across all atoms of that species; 1-D arrays give explicit per-atom values (length = count of atoms of that species).
by_index: maps specific atom indices to values.
- Exactly one of:
values, or any non-empty combination of by_species / by_index. Error if values is mixed with a sparse kwarg. Error if all three are omitted.
- Returns
None.
Unspecified atoms
- Numeric data: fill with
NaN.
- Categorical (string) data: fill with
None, stored as an object-dtype array. A unicode (<U...) array cannot hold None (it coerces to the literal string "None"), so the implementation must explicitly build object-dtype arrays for categorical input. _is_categorical_missing already treats None as missing.
1-D vs 2-D inference
- Output is 1-D
(n_atoms,) unless something promotes it to 2-D (n_frames, n_atoms).
- A
by_species value with shape (n_frames, n_selector_atoms) promotes.
- A
by_index value with shape (n_frames,) promotes.
- When promoted, scalar and 1-D
by_species values broadcast across the frame axis.
Ambiguous case: if n_frames == n_species_atoms, a 1-D by_species value of that length could mean "static per-atom" or "per-frame trajectory shared across the species". Rule: 1-D by_species is always interpreted as per-atom static. Users wanting a shared per-frame trajectory across a species must pass an explicit 2-D array (for example via np.broadcast_to).
Precedence
If a species appears in by_species and an atom of that species also appears in by_index, the by_index value wins. That is the point of allowing both: "all Mn atoms get charge 2.0, except atom 3 which is a defect site at 1.9".
Validation
- Unknown species labels in
by_species raise ValueError.
- Out-of-range or negative indices in
by_index raise ValueError.
- Shape mismatches raise
ValueError with the expected shape in the message.
Constructor symmetry -- descoped
Widen the constructor's atom_data parameter to accept the same sparse forms.
Descoped during design review: the two-step flow (StructureScene(...) then scene.set_atom_data(..., by_species=...)) is cleaner than any of the constructor-widening options considered (magic-key dicts, a new AtomDataSpec type, or dict-key-type sniffing). The constructor stays at dict[str, ArrayLike].
Changelog
This reintroduces a public API that existed until commit 363d493, so it wants a user-facing changelog entry.
Dependencies
Best landed after #64. The "build a full array and assign" pattern is cleaner when the resulting array is guaranteed immutable.
Summary
StructureScenehad aset_atom_data(key, values)method until commit 363d493, including a sparse form that accepteddict[int, value]and filled missing atoms withNaN(numeric) or""(string). The sparse form was dropped whenAtomDatawas introduced; users now have to build full-length arrays by hand, including NaN-filling atoms they do not care about.Reintroduce
set_atom_data()as a method onStructureScene, with sparse forms keyed by species label and/or atom index.Proposed API
values(positional): a full-length array-like. No longer acceptsdict[int, value]-- useby_index=instead.by_species: maps species labels to values. Scalars broadcast across all atoms of that species; 1-D arrays give explicit per-atom values (length = count of atoms of that species).by_index: maps specific atom indices to values.values, or any non-empty combination ofby_species/by_index. Error ifvaluesis mixed with a sparse kwarg. Error if all three are omitted.None.Unspecified atoms
NaN.None, stored as an object-dtype array. A unicode (<U...) array cannot holdNone(it coerces to the literal string"None"), so the implementation must explicitly build object-dtype arrays for categorical input._is_categorical_missingalready treatsNoneas missing.1-D vs 2-D inference
(n_atoms,)unless something promotes it to 2-D(n_frames, n_atoms).by_speciesvalue with shape(n_frames, n_selector_atoms)promotes.by_indexvalue with shape(n_frames,)promotes.by_speciesvalues broadcast across the frame axis.Ambiguous case: if
n_frames == n_species_atoms, a 1-Dby_speciesvalue of that length could mean "static per-atom" or "per-frame trajectory shared across the species". Rule: 1-Dby_speciesis always interpreted as per-atom static. Users wanting a shared per-frame trajectory across a species must pass an explicit 2-D array (for example vianp.broadcast_to).Precedence
If a species appears in
by_speciesand an atom of that species also appears inby_index, theby_indexvalue wins. That is the point of allowing both: "all Mn atoms get charge 2.0, except atom 3 which is a defect site at 1.9".Validation
by_speciesraiseValueError.by_indexraiseValueError.ValueErrorwith the expected shape in the message.Constructor symmetry -- descoped
Widen the constructor'satom_dataparameter to accept the same sparse forms.Descoped during design review: the two-step flow (
StructureScene(...)thenscene.set_atom_data(..., by_species=...)) is cleaner than any of the constructor-widening options considered (magic-key dicts, a newAtomDataSpectype, or dict-key-type sniffing). The constructor stays atdict[str, ArrayLike].Changelog
This reintroduces a public API that existed until commit 363d493, so it wants a user-facing changelog entry.
Dependencies
Best landed after #64. The "build a full array and assign" pattern is cleaner when the resulting array is guaranteed immutable.