Skip to content

Add StructureScene.select_by_species helper #66

@bjmorgan

Description

@bjmorgan

Summary

A common workflow when loading per-atom data from ASE (or any other full-atom source) is "I have a full-length array but I only care about the values for one or two species; fill the rest with the missing-value sentinel so the renderer falls back to species colouring". The raw expression is a np.where + species comparison, which has a few sharp edges: the np.nan sentinel is wrong for string data, unknown species labels silently no-op (masking nothing), and unicode arrays coerce None to the literal string "None".

Add a small helper method on StructureScene that handles these correctly and gives a one-line call site.

Proposed API

scene.select_by_species(arr, 'O')                  # keep only O atoms
scene.select_by_species(arr, ['O', 'N'])           # keep multiple species
scene.select_by_species(arr, species='O')          # kwarg form also allowed
  • Second parameter (positional or species=) accepts a str or an iterable of str.
  • Returns a copy of arr with entries for non-selected atoms replaced by the appropriate sentinel:
    • Numeric: np.nan (with dtype promotion to float if the input was integer).
    • Categorical: None, stored as an object-dtype array. Unicode input is cast to object first.
  • Validates species labels against scene.species; unknown labels raise ValueError.
  • Works for 1-D (n_atoms,) and 2-D (n_frames, n_atoms) inputs. The 1-D species mask broadcasts across the frame axis automatically, so the 2-D case needs no special handling.

Naming

select_by_species was chosen to match the by_species= kwarg on set_atom_data() (#65), giving a consistent vocabulary across the API. mask_species was considered and rejected: "mask" conventionally implies "hide" in numpy and pandas (np.ma, pandas.DataFrame.mask), so it would read ambiguously (drop vs keep) at call sites.

Call-site example

scene.set_atom_data(
    'charge',
    scene.select_by_species(full_array, 'O'),
)

Documentation

The raw pattern

np.where(np.array(scene.species) == 'O', values, np.nan)

is still worth documenting in the narrative docs as a "how do I do X" recipe, so users understand the underlying mechanism. The helper is the recommended one-liner; the bare np.where form is the explanation of what it does underneath and remains the right answer when someone is working with per-atom arrays outside the atom_data pipeline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions