Skip to content

Feature: add toy data#493

Merged
jsmariegaard merged 25 commits intomainfrom
Feature-489-add-toy-data
Jan 9, 2025
Merged

Feature: add toy data#493
jsmariegaard merged 25 commits intomainfrom
Feature-489-add-toy-data

Conversation

@jsmariegaard
Copy link
Copy Markdown
Member

@jsmariegaard jsmariegaard commented Dec 26, 2024

>>> import modelskill as ms
>>> cc = ms.data.vistula()
>>> cc

<ComparerCollection>
Comparers:
0: Tczew - Discharge [m3/s]
1: Krasnystaw - Discharge [m3/s]
2: Sandomierz - Discharge [m3/s]
3: Szczucin - Discharge [m3/s]
4: Nowy Sacz - Discharge [m3/s]
5: Tryncza - Discharge [m3/s]
6: Ptaki - Discharge [m3/s]
7: Suraz - Discharge [m3/s]

>>> cc = ms.data.oresund()
>>> cc

<ComparerCollection>
Comparers:
0: Drogden - Surface Elevation [meter]
1: Barseback - Surface Elevation [meter]
2: Helsingborg - Surface Elevation [meter]
3: Kobenhavn - Surface Elevation [meter]
4: Koege - Surface Elevation [meter]
5: MalmoHamn - Surface Elevation [meter]
6: Vedbaek - Surface Elevation [meter]

Both datasets are now around 1MB, both contains aux data and attrs, that could be used for examples and testing in the future (not yet used). The data module has been added to the api docs.

And a later point it would be great to add more datasets: nortseawaves, ...

It would also be great to add the new notebook to the examples in docs.

@jsmariegaard jsmariegaard linked an issue Dec 26, 2024 that may be closed by this pull request
@jsmariegaard
Copy link
Copy Markdown
Member Author

What is an acceptable data file size to include in the package? 1MB per case? Should we maybe remove some stations in above examples? Could we reduce to float16 or use other tricks to save some file size?

@ecomodeller
Copy link
Copy Markdown
Member

I think we can remove some stations and change to float32.

@jsmariegaard
Copy link
Copy Markdown
Member Author

Ways to make datasets smaller on disk:

  • reduce time period
  • reduce number of observations
  • float32 instead of float64
  • crop modelresult to period covered by obs
  • reduce time resolution of modelresults (e.g. 3 hourly instead of 30 min)

Further ideas to reduce disk size (not tried):

@jsmariegaard jsmariegaard marked this pull request as ready for review January 8, 2025 15:20
@jsmariegaard jsmariegaard merged commit 5e11449 into main Jan 9, 2025
@jsmariegaard jsmariegaard deleted the Feature-489-add-toy-data branch January 9, 2025 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add toy data

2 participants