Skip to content

Skill groupby attrs#351

Merged
jsmariegaard merged 34 commits intomainfrom
skill-grouby-attrs
Jan 4, 2024
Merged

Skill groupby attrs#351
jsmariegaard merged 34 commits intomainfrom
skill-grouby-attrs

Conversation

@jsmariegaard
Copy link
Copy Markdown
Member

@jsmariegaard jsmariegaard commented Dec 19, 2023

Extracted by="attrs:gtype" part of #331

e.g. "attrs:gtype" or "attrs:DA" (to distinguish between assimilation and validation stations)

image

@jsmariegaard
Copy link
Copy Markdown
Member Author

The to_dataframe() method returned object dtypes which I have now changed to category. This however leads to problems with the groupby specifically for cc.mean_skill() in multiple variable cases. I tried different things including setting observed=True, but that gives problems in gridded_skill() (where empty bins makes sense). I think that the actual problem is that the "default" by that mean_skill() passed on to skill() is ["model", "observation", "variable"] even though an observation can only have one variable. So I guess the by should instead be ["model", "observation"] and then just add variable afterwards. Maybe we should even check that observation and variable do not both occur in the by?

@ecomodeller
Copy link
Copy Markdown
Member

Very useful functionality! 👍

Here is a snippet of a slightly incomplete example, where I have added attrs to 2(3) observations.

When the attrs is absent, the default is now to exclude it from the skill table, but by setting observed=True

...
>>> o1 = ms.PointObservation('HKNA_Hm0.dfs0', attrs={"use": "calibration"})
>>> o2 = ms.PointObservation("eur_Hm0.dfs0",   attrs={"use": "validation"})
>>> o3 = ms.TrackObservation("Alti_c2_Dutch.dfs0")
>>> cc = ms.match(obs=[o1, o2, o3], mod=[mr1, mr2])
>>> cc.skill(by=("model","attrs:use"), observed=False).round(2)
                     n  bias  rmse  urmse   mae    cc    si    r2
model use
SW_1  False        113 -0.00  0.35   0.35  0.29  0.97  0.13  0.90
      calibration  386 -0.19  0.35   0.29  0.25  0.97  0.09  0.91
      validation    67 -0.07  0.22   0.21  0.19  0.97  0.08  0.93
SW_2  False        113  0.08  0.43   0.42  0.36  0.97  0.15  0.85
      calibration  386 -0.10  0.29   0.28  0.21  0.97  0.09  0.93
      validation    67 -0.00  0.23   0.23  0.20  0.97  0.09  0.93
>>> cc.skill(by=("model","attrs:use")).round(2)
                     n  bias  rmse  urmse   mae    cc    si    r2
model use
SW_1  calibration  386 -0.19  0.35   0.29  0.25  0.97  0.09  0.91
      validation    67 -0.07  0.22   0.21  0.19  0.97  0.08  0.93
SW_2  calibration  386 -0.10  0.29   0.28  0.21  0.97  0.09  0.93
      validation    67 -0.00  0.23   0.23  0.20  0.97  0.09  0.93

@jsmariegaard
Copy link
Copy Markdown
Member Author

Finally managed to do the merge 😬that was not easy

@jsmariegaard jsmariegaard marked this pull request as ready for review January 3, 2024 16:47
def test_skill_by_attrs_gtype(cc):
sk = cc.skill(by="attrs:gtype")
assert len(sk) == 2
assert sk.data.index[0] == "point"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems overly specific to assert that the index is sorted in this order.

Isn't it enough to verify that:

assert "point" in sk.index
assert "track" in sk.index

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true - will fix in next PR which will be on sorting

@jsmariegaard jsmariegaard merged commit 2e495c2 into main Jan 4, 2024
@jsmariegaard jsmariegaard deleted the skill-grouby-attrs branch January 4, 2024 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants