-
Notifications
You must be signed in to change notification settings - Fork 11
Description
In #153, we are discussing how to handle preparing the pyfesom2 analysis tools for large data. One particular need arises here: how to handle (in future), the mesh representation on the Python side.
Several options exist, as far as I see (please feel free to edit the main issue and add more info, if anyone has something to add)
A. We continue with plain text files.
Pro: It's already there, many users are used to it, we get to feel retro and old-school as if we are still in the 80s when computing was still the wild west.
Con: ....it's plain text, and feels rather old-school and is if we are still in the wild west. There are smarter ways of doing it by now.
B. Switch to a NetCDF Format
Pro: We can self-document all of the mesh in the file itself. We know which part of the plain text file (which, is now a netcdf file) shows nodes, Lon/lat, ocean/coastline, etc etc.
Con: We need user migration. This would also imply a deeper switch inside of FESOM itself.
C. Hybrid Model
(Paul's preference right now)
We allow the user to read in a "plain text" mesh, and check if there is a NetCDF version there. If not, we make one "on-the-fly" for the next time around. This would speed up user migration to the new format.
Pro: We could, for a time, support both plain text and new netcdf meshes.
Con: It might take a bit of programming flexibility, but if we decide on a strategy, I am happy to volunteer the dirty part of that work.
One more thought:
I would, however, argue against including all the mesh information directly in the model output (this was something I had wished for earlier, but it would explode the file size). Rather, I would suggest that we find some way to publish all of our meshes publicly (if that is feasible), and provide metadata in the output files where the mesh can be accessed (via FTP, Git LFS, DKRZ Swift, whatever). We already have the mesh path in one of the namelists, it should not be too hard to just dump an extra line of metadata into the files when writing output. Then, when a user loads a particular dataset, whatever we have as a load function can check for this, and also load the accompanying information which may be needed for other operations.