Better matching and time interpolation#281
Conversation
values
modelskill/timeseries/_timeseries.py
Outdated
| self.data = data | ||
|
|
||
| def interp_time(self, new_time: pd.DatetimeIndex) -> TimeSeries: | ||
| def interp_time( |
There was a problem hiding this comment.
There seems to be a flaw in our design. 😳
- It makes sense for a
TimeSeriesto be able to be interpolated. - It doesn't make sense to interpolate in track data, e.g.
TrackModelResultandTrackObservation(these points are scattered in a 3d space (time, y, x)) - Thus, track data are not
TimeSeries(at least in this sense) 🤔
There was a problem hiding this comment.
Hmm, maybe we should have dedicated TrackTimeseries and PointTimeseries after all?
…ide_observation_track
…nd y matches! Satellite tracks can be outside model domain and therefore contain more data.
modelskill/matching.py
Outdated
| assert len(observation.time) > 0 | ||
| mri = mr.interp_time(new_time=observation.time) | ||
| else: | ||
| # It doesn't make sense to interpolate track data in time |
There was a problem hiding this comment.
@ecomodeller could you please help with this part? We still need matching for track data, just a different kind than for point data. For track data we should make sure t, x and y match (within a tolerance) - the model and observation series will not necessarily have same length or exact same data even though the observation data was used to extract data from the dfsu model result. The obs data will typically be covering more than the model domain. Also, we have in the past had problems around rounding of position and time data.
There was a problem hiding this comment.
I think we should re-evaluate how much of this track matching logic that belongs in the core of modelskill.
If the observation data was used to extract model data, they should already match, otherwise the problem is in the extraction process. I would assume that it would be possible to get a matched dataset in connection with the extraction and then supply this pre-matched data to modelskill with the modelskill.from_matched(...) function.
There was a problem hiding this comment.
I made an attempt of creating some matching logic, but it is difficult since I don't fully understand the requirements.
I can understand two ends of the spectrum.
- Extracted data that already matches, i.e. equal number of rows
- Intersection of observation and model data in 3d (t,y,x) with a buffer applied to each point
In order to something in between, I need more information on what is expected from the data.
$ mypy modelskill/timeseries/_timeseries.py --strict
Success: no issues found in 1 source file
$ mypy modelskill/matching.py --strict
Success: no issues found in 1 source file |
modelskill/matching.py
Outdated
| if isinstance(obs, get_args(ObsInputType)): | ||
| cmp = _single_obs_compare( | ||
| obs, | ||
| obs = [obs] if isinstance(obs, get_args(ObsInputType)) else obs |
modelskill/matching.py
Outdated
| keep_x = np.abs((df.x_mod - df.x_obs)) < spatial_tolerance | ||
| keep_y = np.abs((df.y_mod - df.y_obs)) < spatial_tolerance | ||
| df = df[keep_x & keep_y] | ||
| mri.data[name] = df[name] |
There was a problem hiding this comment.
Does this keep the attrs?
This PR removes the use of dataframes as a temporary solution when interpolating and other improvements in the matching between observation and model results.