Summary
A common workflow when loading per-atom data from ASE (or any other full-atom source) is "I have a full-length array but I only care about the values for one or two species; fill the rest with the missing-value sentinel so the renderer falls back to species colouring". The raw expression is a np.where + species comparison, which has a few sharp edges: the np.nan sentinel is wrong for string data, unknown species labels silently no-op (masking nothing), and unicode arrays coerce None to the literal string "None".
Add a small helper method on StructureScene that handles these correctly and gives a one-line call site.
Proposed API
scene.select_by_species(arr, 'O') # keep only O atoms
scene.select_by_species(arr, ['O', 'N']) # keep multiple species
scene.select_by_species(arr, species='O') # kwarg form also allowed
- Second parameter (positional or
species=) accepts a str or an iterable of str.
- Returns a copy of
arr with entries for non-selected atoms replaced by the appropriate sentinel:
- Numeric:
np.nan (with dtype promotion to float if the input was integer).
- Categorical:
None, stored as an object-dtype array. Unicode input is cast to object first.
- Validates species labels against
scene.species; unknown labels raise ValueError.
- Works for 1-D
(n_atoms,) and 2-D (n_frames, n_atoms) inputs. The 1-D species mask broadcasts across the frame axis automatically, so the 2-D case needs no special handling.
Naming
select_by_species was chosen to match the by_species= kwarg on set_atom_data() (#65), giving a consistent vocabulary across the API. mask_species was considered and rejected: "mask" conventionally implies "hide" in numpy and pandas (np.ma, pandas.DataFrame.mask), so it would read ambiguously (drop vs keep) at call sites.
Call-site example
scene.set_atom_data(
'charge',
scene.select_by_species(full_array, 'O'),
)
Documentation
The raw pattern
np.where(np.array(scene.species) == 'O', values, np.nan)
is still worth documenting in the narrative docs as a "how do I do X" recipe, so users understand the underlying mechanism. The helper is the recommended one-liner; the bare np.where form is the explanation of what it does underneath and remains the right answer when someone is working with per-atom arrays outside the atom_data pipeline.
Summary
A common workflow when loading per-atom data from ASE (or any other full-atom source) is "I have a full-length array but I only care about the values for one or two species; fill the rest with the missing-value sentinel so the renderer falls back to species colouring". The raw expression is a
np.where+ species comparison, which has a few sharp edges: thenp.nansentinel is wrong for string data, unknown species labels silently no-op (masking nothing), and unicode arrays coerceNoneto the literal string"None".Add a small helper method on
StructureScenethat handles these correctly and gives a one-line call site.Proposed API
species=) accepts astror an iterable ofstr.arrwith entries for non-selected atoms replaced by the appropriate sentinel:np.nan(with dtype promotion to float if the input was integer).None, stored as an object-dtype array. Unicode input is cast to object first.scene.species; unknown labels raiseValueError.(n_atoms,)and 2-D(n_frames, n_atoms)inputs. The 1-D species mask broadcasts across the frame axis automatically, so the 2-D case needs no special handling.Naming
select_by_specieswas chosen to match theby_species=kwarg onset_atom_data()(#65), giving a consistent vocabulary across the API.mask_specieswas considered and rejected: "mask" conventionally implies "hide" in numpy and pandas (np.ma,pandas.DataFrame.mask), so it would read ambiguously (drop vs keep) at call sites.Call-site example
Documentation
The raw pattern
is still worth documenting in the narrative docs as a "how do I do X" recipe, so users understand the underlying mechanism. The helper is the recommended one-liner; the bare
np.whereform is the explanation of what it does underneath and remains the right answer when someone is working with per-atom arrays outside theatom_datapipeline.