Skip to content

Add a CSV/XLSX file reader to core #656

@davidorme

Description

@davidorme

Both the plants and animals models require the users to provide cohort data. For plants, this is providing tuples of data:

(cell id, plant functional type, number of individuals, individual size)

There can be multiple entries per cell id and different numbers of cohorts per cell. The easiest and sanest format for this data is a simple data frame of those tuples and the natural format for creating and maintaining that data is a CSV or XSLX file. Forcing users to convert this into NetCDF for input is not sensible.

So, we need to:

  • Add a CSV/XLSX loader.
  • This should use pandas as that is already a requirement of xarray and is designed explicitly to handle data frames, rather than using the standard library csv or any of the numpy structures.
  • I think we will need to explicitly add openxlsx to [tool.poetry.dependencies] to support reading XLSX format.
  • Test that it works!

It should go in virtual_ecosystem.core.readers and I think the signature will look like:

@register_file_format_loader(file_types=(".csv", ".xlsx"))
def load_from_dataframe(file: Path, var_name: str) -> DataArray:
    """Loads a DataArray from a data frame format."""

The format registry should then automatically switch to using this loader for CSV and XLSX files.

There is some ugliness here in that the file is going to be opened multiple times to load each variable as we don't have persistent file handles, but the same is currently true for NetCDF. A better way to do this in future would be to open each file within the data configuration once to access a tuple of variables that are claimed to live in that file, rather than independently opening the file specified for each variable.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions