Conversation
|
This looks like a good design @blimlim. As far as the need to do this goes, you could effectively deprecate the |
Co-authored-by: Aidan Heerdegen <aidan.heerdegen@anu.edu.au>
|
Thank you @blimlim for starting to work on this! I like the overall class-based design. I quickly had a look at the code and added a few comments. Some are questions to try and better understand your design idea. On a more general level, I also have the following questions:
These comments are mostly to get a better understanding and have a clearer view of the design model. Thank you! :) |
Co-authored-by: Davide Marchegiani <davide.marchegiani@gmail.com>
|
Thanks @atteggiani for your detailed feedback, those are all very useful points and questions. I'll try to respond to them here!
That's correct that there's no drivers for CM2/CM3 in um2nc. Instead there a separate python scripts within the cylc suites which perform essentially the same functions as a driver, i.e. finding inputs, setting output paths, and running the conversion:
I used the common steps between these scripts and the ESM1.5 driver to help think of a general structure.
Good point, I agree "inputs" and "outputs" aren't being used consistently right now which is a bit confusing. Your suggestion to consistently use "inputs" for inputs to um2nc, and outputs to refer to output netCDF files makes sense. I've tried to make this consistent in 6e7d5a6, and have also used "model history directory" in the docstrings to refer to the directories created by the model – model output files are often called "history files", and so this might be clearer than the previous use of "model output directories".
This makes sense to me, I've updated the proposed implementation in 6e7d5a6 so that finding the input directory is included within step 1.
That makes sense to me, I agree it doesn't need to be considered as a core step. In the implementation I think it should be set up as a one of the general driver methods which can be redefined for specific drivers, which will allow each model to have a different naming scheme. |
I see. I suppose that the next step will be to convert these scripts into actual |
Yeah definitely! For CM3, I believe we'd like to stop using the old um2netcdf script and add a driver here. For CM2 it should be viable to add a driver, but I suspect it might not be a priority |
atteggiani
left a comment
There was a problem hiding this comment.
I left a few more comments.
|
Thanks for the ideas and discussion @atteggiani! I've marked this one ready for review, let me know if you have any other ideas or suggestions! |
atteggiani
left a comment
There was a problem hiding this comment.
Hi @blimlim,
Thank you for the updates.
I went through the whole architectural implementation and I only made some comments on the ModelDriver class (because those would influence the rest of the code).
Please have a look at my suggestion, and feel free to ask for any clarification on it.
We might want to have a chat as I feel it would be easier to discuss these design choices.
Co-authored-by: Davide Marchegiani <davide.marchegiani@gmail.com>
|
Thanks @atteggiani for the suggestions! I've had a go at implementing the suggested structure, and have added some comments around some of the choices made. Let me know if you have any further thoughts or suggestions! |
atteggiani
left a comment
There was a problem hiding this comment.
Thank you @spencer for the updates.
I added a few comments mainly on the common parts (because they affect the rest).
Feel free to start a discussion below any comments :)
|
Thanks again @atteggiani for the helpful suggestions and feedback :) I've had a go adding them in, and if you get a chance to look at the latest changes that would be great! |
atteggiani
left a comment
There was a problem hiding this comment.
Thank you @blimlim for addressing my review!
The updates overall look pretty good! I like the tests you implemented.
I left a few more suggestions and I noticed the specific model drivers are missing tests.
Please refer to the specific comments for details
There was a problem hiding this comment.
There are no tests for the Esm1p5Driver
There was a problem hiding this comment.
The differences between the Esm1p5Driver and the base ModelDriver are essentially the setting of the atmosphere_dir, output_dir, and unit_suffixes properties, and the specific implementations of the get_input_paths and get_output_path methods.
The get_input_paths method only calls the find_esm1px_fields_files function, which is could probably do with an additional test added.
The get_output_path method almost directly calls the get_nc_filename function which has its own tests
I'm thinking one option would be to remove these functions and implement them directly in get_input_paths and get_output_path methods, and then test them through tests of the drivers. I don't expect these functions to be required outside of the drivers and so it could be a bit cleaner. Would this be preferable?
There was a problem hiding this comment.
We should still test each of those properties.
For the methods, if they call a function that needs is reused by multiple drivers (like in this case), I would still keep the function in the common.py and test it in test_common.py, but still test each specific method implemented. Even if the test would only be testing that the function gets actually called (depending on the specific method).
The same exact concept applies for Esm1p6Driver.
test/drivers/test_drivers_esm1p6.py
Outdated
| Check that netCDF file naming produces expected file paths for various | ||
| expected unit keys. | ||
| """ | ||
| nc_name = esm1p5_convert.get_nc_filename( |
There was a problem hiding this comment.
| nc_name = esm1p5_convert.get_nc_filename( | |
| nc_name = esm1p6.get_nc_filename( |
There was a problem hiding this comment.
There are no tests for the Esm1p6Driver
Closes #216. I've added this PR to start some discussion around the proposed implementation, and so will start it off as a draft.
Background
So far, ESM1.6 simulations have been using the ESM1.5 conversion driver. However, we now wish to change the output file names and locations when running the conversion for ESM1.6, and hence need to add a separate driver. Previous work on the code organisation (#214) and command line interface (#221) laid the groundwork for this change.
However, the way that the ESM1.5 driver is constructed as a large function containing many steps, does not lend itself to adding a similar ESM1.6 driver with a few small differences:
um2nc-standalone/src/um2nc/drivers/esm1p5.py
Lines 100 to 161 in 11306b4
The only approach I could think of which keeps the current structure would be to copy and paste the esm1.5 driver function, into a new function for esm1.6. This felt like it would be a bad idea, and following discussion with @atteggiani, we decided to investigate implementing a class based approach for the drivers.
Proposed implementation
Based on the ESM1.5 and (embedded in rose/cylc suites) CM2, and CM3 conversion drivers, model conversion drivers seem to have the following general steps in common:
Based on these common steps, I've added a generic
ModelDriverclass, with empty methods for steps 1-4. It then combines them into a conversion method for step 5.um2nc-standalone/src/um2nc/drivers/common.py
Lines 18 to 22 in c3c6804
Specific model drivers are then created as subclasses of the
ModelDriverclass, and they can fill in the model-specific details for the methods in steps 1-4 (e.g. models may have different output directory structures, filenames etc).This allows us to define a ESM1.5 driver, and an ESM1.6 driver as a subclass, with only the relevant details changed:
um2nc-standalone/src/um2nc/drivers/esm1p5.py
Lines 27 to 39 in c3c6804
https://g.ithub.com/ACCESS-NRI/um2nc-standalone/blob/c3c68044ba78c2d4e3f33d26dce6d7a790800b90/src/um2nc/drivers/esm1p6.py#L11-L22
This structure is largely based on payu's Model class.
A potential limitation, is that I've assumed conversion drivers will loop through a sequence of input files, and will convert each into a separate netCDF. This wouldn't hold if in the future we want a driver to have more complicated behaviour such as combining monthly data into year length files, or splitting multi-variable files into single variable netCDFs.
@atteggiani any comments or ideas you have about the proposed changes are very welcome! @aidanheerdegen, I've pinged you just to check that the ESM1.6 frequency names which are added to the filenames are as required, however any other comments are welcome too.