Conversation
modelskill/comparison/_comparison.py
Outdated
| max_model_gap: Optional[TimeDeltaTypes] = None, | ||
| matched_data: Optional[xr.Dataset] = None, | ||
| raw_mod_data: Optional[Dict[str, pd.DataFrame]] = None, | ||
| raw_mod_data: Optional[Dict[str, pd.Series]] = None, |
There was a problem hiding this comment.
Do we need to update the docstring to match? Or is that automated somehow?
There was a problem hiding this comment.
Not automated, but Copilot is very helpful in creating new ones.
| def save(self, fn: Union[str, Path]) -> None: | ||
| # save to file in netcdf format using xarray | ||
| # save each comparer to a netcdf and pack them into a zip file | ||
| def save(self, filename: Union[str, Path]) -> None: |
There was a problem hiding this comment.
Would it be useful to add an optional argument to avoid accidentally overwriting existing files? Like wrapping the 'mode' parameter from ZipFIle? Default could be kept as 'w' or 'x' depending on which is more convenient
There was a problem hiding this comment.
Sure, I haven't considered that possibility. If users expect a mode parameter, we should probably add it to all methods that writes files, e.g. mikeio.Dataset.to_dfs()
| ItemSelection(obs="foo", model=["foo", "bar"], aux=["baz"]) | ||
|
|
||
|
|
||
| def test_save_comparercollection(o1, o3, tmp_path): |
There was a problem hiding this comment.
Should this actually be in test_comparercollection.py?
There was a problem hiding this comment.
Maybe, I suppose the entire test suite needs to be (re-)organized. I think it would make sense to split tests into unit tests and integration tests and end-to-end tests, or something along those lines.
ryan-kipawa
left a comment
There was a problem hiding this comment.
Looks pretty good, nice going.
jsmariegaard
left a comment
There was a problem hiding this comment.
Great work. I like the content related to the name of the PR.
I am not so sure about the change of behavior for compare() which now always returns a cc. I somehow appreciate the consistency, but to me, it was quite meaningful before: if compare one obs then I get one comparer, if I compare multiple obss then I get a collection. Makes sense. Let's discuss tomorrow :-)
But related to save/load functionality, I guess my only comment is: is "raw_" a special enough name that we will not get in trouble if people accidentally named their station "raw_stn1" or similar? I would suggest "raw" or something else that starts with an underscore to indicate that is an internal structure.
I guess I also agree with @rywm-dhi on the overwrite functionality...
The main reason I changed The |
I'll add an underscore at the beginning, but we could also use NetCDF groups and split the raw data into separate datasets. |
It seems like it could be useful, but after inspecting some other methods writing files in Python, checking if the file exists behavior seem to be very unusual, so I guess it is better to leave this up to the end user, they can add a path.exists(), since the default would be to overwrite anyway. |
jsmariegaard
left a comment
There was a problem hiding this comment.
Let's go with this for now - and discuss later about return type of compare()
No description provided.