Skip to content

Use xarray dataset as internal data container#262

Merged
ecomodeller merged 75 commits intomainfrom
use-xarray-dataset
Sep 29, 2023
Merged

Use xarray dataset as internal data container#262
ecomodeller merged 75 commits intomainfrom
use-xarray-dataset

Conversation

@jsmariegaard
Copy link
Copy Markdown
Member

@jsmariegaard jsmariegaard commented Sep 20, 2023

Changing the internal data structure of observations from pd.DataFrame to xr.Dataset.

Some considerations:

  • When should na values be removed? (and should we keep track na-values? before this PR it was dependent on the input type i.e. different functionality whether you constructed from df or dfs0)
  • Can all "matching" be extracted from _comparison and moved to matching.py? Such that Comparer can only be initialized with an already-matched dataset
  • TrackModelResult is now almost equal to TrackObservation and PointModelResult is almost equal to PointObservation - should we rather make a TrackTimeseries and a PointTimeseries and have most functionality in there?

Objectives of this PR:

  • Consistent .data throughout modelskill
  • Make it easy to match modelresults and observations (as everything is now xr.Dataset)
  • Make it easy to add aux items to observation and modelresults
  • Make it easy to add attrs container to observation and modelresults

@jsmariegaard jsmariegaard marked this pull request as ready for review September 27, 2023 18:38
@jsmariegaard
Copy link
Copy Markdown
Member Author

Much more to be done, but maybe good to merge soon to ease collaboration and minimize merge conflicts

return data


def _model2obs_interp(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this function doesn't reveal that it also remove gaps, also not the docstring. Should gap removal be part of this function.

@ecomodeller
Copy link
Copy Markdown
Member

I tried to modify a few lines like this

ds[name].attrs["quantity"] = quantity.to_dict()

to

ds[name].attrs["long_name"] = quantity.name
ds[name].attrs["units"] = quantity.unit

To be able to persist the dataset as NetCDF and be compatible with CF convention making easier to plot the data with xarray, but somehow managed to change more than necessary, making a lot of tests to fail. ☹️

So I deleted those changes and only committed the tests that saves the data as NetCDF.

@ecomodeller
Copy link
Copy Markdown
Member

Plotting the xarray dataarray uses long_name and units
image

@ecomodeller
Copy link
Copy Markdown
Member

x, y in track data are now coordinates and not data variables.
image

@jsmariegaard
Copy link
Copy Markdown
Member Author

x, y in track data are now coordinates and not data variables. image

I was wondering about this. Thanks

@ecomodeller ecomodeller merged commit 1196c1a into main Sep 29, 2023
@jsmariegaard jsmariegaard deleted the use-xarray-dataset branch November 20, 2023 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants