Prototype ModelDriver class and esm1.6 driver by blimlim · Pull Request #229 · ACCESS-NRI/um2nc-standalone

blimlim · 2026-03-04T00:59:30Z

Closes #216. I've added this PR to start some discussion around the proposed implementation, and so will start it off as a draft.

Background

So far, ESM1.6 simulations have been using the ESM1.5 conversion driver. However, we now wish to change the output file names and locations when running the conversion for ESM1.6, and hence need to add a separate driver. Previous work on the code organisation (#214) and command line interface (#221) laid the groundwork for this change.

However, the way that the ESM1.5 driver is constructed as a large function containing many steps, does not lend itself to adding a similar ESM1.6 driver with a few small differences:

um2nc-standalone/src/um2nc/drivers/esm1p5.py

Lines 100 to 161 in 11306b4

    
           def convert_esm1p5_output_dir(esm1p5_output_dir, process_args): 
        
               """ 
        
               Driver function for converting ESM1.5 atmospheric outputs during a simulation. 
        
               Parameters 
        
               ---------- 
        
               esm1p5_output_dir: an "outputXYZ" directory produced by an ESM1.5 simulation. 
        
                       Fields files in the "atmosphere" subdirectory will be 
        
                       converted to netCDF. 
        
               process_args: argparse Namespace object carrying processing arguments. 
        
               Returns 
        
               ------- 
        
               succeeded: list of tuples of (input, output) filepaths for successful 
        
                          conversions. 
        
               failed: list of tuples of form (filepath, exception) for files which failed 
        
                       to convert due to an allowed exception. 
        
               """ 
        
               esm1p5_output_dir = Path(esm1p5_output_dir) 
        
               current_atm_output_dir = esm1p5_output_dir / "atmosphere" 
        
               if not current_atm_output_dir.exists(): 
        
                   raise FileNotFoundError( 
        
                       errno.ENOENT, os.strerror(errno.ENOENT), current_atm_output_dir 
        
                   ) 
        
               # Create a directory for writing netCDF files 
        
               nc_write_dir = current_atm_output_dir / "netCDF" 
        
               nc_write_dir.mkdir(exist_ok=True) 
        
               # Find fields file outputs to be converted 
        
               xhist_nml = f90nml.read(current_atm_output_dir / "xhist") 
        
               run_id = xhist_nml["nlchisto"]["run_id"] 
        
               fields_file_name_pattern = get_fields_file_pattern(run_id) 
        
               atm_dir_contents = current_atm_output_dir.glob("*") 
        
               atm_dir_fields_files = find_matching_files( 
        
                   atm_dir_contents, fields_file_name_pattern 
        
               ) 
        
               if len(atm_dir_fields_files) == 0: 
        
                   warnings.warn( 
        
                       f"No files matching pattern '{fields_file_name_pattern}' " 
        
                       f"found in {current_atm_output_dir.resolve()}. No files will be " 
        
                       "converted to netCDF." 
        
                   ) 
        
                   return [], []  # Don't try to run the conversion 
        
               output_paths = [get_nc_write_path(path, nc_write_dir, get_ff_date(path)) for path in atm_dir_fields_files] 
        
               input_output_pairs = zip(atm_dir_fields_files, output_paths) 
        
               input_output_pairs = filter_name_collisions(input_output_pairs) 
        
               succeeded, failed = convert_fields_file_list(input_output_pairs, process_args) 
        
               if process_args.delete_ff: 
        
                   # Remove files that appear only as successful conversions 
        
                   for path in safe_removal(succeeded, failed): 
        
                       os.remove(path)

The only approach I could think of which keeps the current structure would be to copy and paste the esm1.5 driver function, into a new function for esm1.6. This felt like it would be a bad idea, and following discussion with @atteggiani, we decided to investigate implementing a class based approach for the drivers.

Proposed implementation

Based on the ESM1.5 and (embedded in rose/cylc suites) CM2, and CM3 conversion drivers, model conversion drivers seem to have the following general steps in common:

Find the directory containing model outputs to convert
Find/setup the directory to write netCDF files to
Find the fields files in the input directory to be converted
Set output netCDF filenames/paths for each input file
Run the conversion

Based on these common steps, I've added a generic ModelDriver class, with empty methods for steps 1-4. It then combines them into a conversion method for step 5.

um2nc-standalone/src/um2nc/drivers/common.py

Lines 18 to 22 in c3c6804

    
           class ModelDriver: 
        
               """ 
        
               Generic model conversion driver class. Defines a general sequence of steps 
        
               which are followed by the drivers. 
        
               """

Specific model drivers are then created as subclasses of the ModelDriver class, and they can fill in the model-specific details for the methods in steps 1-4 (e.g. models may have different output directory structures, filenames etc).

This allows us to define a ESM1.5 driver, and an ESM1.6 driver as a subclass, with only the relevant details changed:

um2nc-standalone/src/um2nc/drivers/esm1p5.py

Lines 27 to 39 in c3c6804

    
           class Esm1p5Driver(ModelDriver): 
        
               # Output file suffix for each type of unit. 
        
               UNIT_SUFFIXES = { 
        
                   "a": "mon", 
        
                   "e": "dai", 
        
                   "i": "3hr", 
        
                   "j": "6hr", 
        
               } 
        
               def get_input_dir(self, parent_dir): 
        
                   """ 
        
                   Given a path to an experiment parent directory, return the atmosphere output directory

https://g.ithub.com/ACCESS-NRI/um2nc-standalone/blob/c3c68044ba78c2d4e3f33d26dce6d7a790800b90/src/um2nc/drivers/esm1p6.py#L11-L22

This structure is largely based on payu's Model class.

A potential limitation, is that I've assumed conversion drivers will loop through a sequence of input files, and will convert each into a separate netCDF. This wouldn't hold if in the future we want a driver to have more complicated behaviour such as combining monthly data into year length files, or splitting multi-variable files into single variable netCDFs.

@atteggiani any comments or ideas you have about the proposed changes are very welcome! @aidanheerdegen, I've pinged you just to check that the ESM1.6 frequency names which are added to the filenames are as required, however any other comments are welcome too.

aidanheerdegen · 2026-03-04T03:27:36Z

This looks like a good design @blimlim.

As far as the need to do this goes, you could effectively deprecate the 1p5 driver, just rename it to 1p6 and put in the changes. Anyone who needs old behaviour has to use old versions. This only works if um2nc is separately deployed from the payu environment, which it isn't ... so is maybe not that useful a suggestion.

src/um2nc/drivers/esm1p5.py

src/um2nc/drivers/esm1p6.py

Co-authored-by: Aidan Heerdegen <aidan.heerdegen@anu.edu.au>

atteggiani · 2026-03-05T23:45:35Z

Thank you @blimlim for starting to work on this!

I like the overall class-based design.

I quickly had a look at the code and added a few comments. Some are questions to try and better understand your design idea.

On a more general level, I also have the following questions:

... (embedded in rose/cylc suites) CM2, and CM3 conversion drivers, ...

What do you mean by "embedded in rose/cylc suites"? There is no currently no um2nc driver cm2 command, so how is CM2 driver currently embedded in the rose/cylc suite?
When you mention "inputs" and "outputs" files, directories, etc., are they referring to the inputs and outputs of um2nc, or the models?
It was not super clear at times and I got a bit confused.
I know that for model drivers um2nc conversion inputs are model outputs, but ultimately, we care about um2nc inputs and outputs, so for the implementation I don't think we care too much if they are model outputs, random files, or anything else. Therefore, I would suggest referring mostly to "inputs" and "outputs" files relative to um2nc, and using "model outputs" very rarely (Maybe only in a model driver general docstring) to avoid confusion.
For example, instead of using output_files (referring to the model output files) we could use input_files or conversion_target_files to refer to the um2nc conversion target (input) files.
1. Find the directory containing model outputs to convert
2. Find/setup the directory to write netCDF files to
3. Find the fields files in the input directory to be converted
4. Set output netCDF filenames/paths for each input file
5. Run the conversion
A model driver command, ultimately, is a set of multiple um2nc conversions for multiple files, to which there might also be pre/post-processing associated. For this reason, I would consider each um2nc file conversion independent, and for this reason they could also be easily parallelised in the future.
Therefore, I would prioritise the following steps:
1. Get the list of targets (files) to be converted by um2nc coversion.
2. Get the output directory where to write the converted files (this is basically your point 2)
3. Perform the conversion (with any pre/post-processing steps)
I think finding the directory containing the model outputs (which contain some of the um2nc conversion targets above) is a step included within my number 1 above. There might be drivers in which these targets are found in multiple directories, therefore I wouldn't list that as a "core" step for a driver conversion. Also related to this comment.
Also, your number 4 is part of the post-processing steps, which can be included within the conversion step (it is also partly included in getting the output dir).

These comments are mostly to get a better understanding and have a clearer view of the design model.
And if you think it might be useful, we could also have a chat about this. It would also help me to better align with your vision.

Thank you! :)

.github/workflows/CI.yml

src/um2nc/drivers/common.py

test/drivers/test_drivers_esm1p5.py

Co-authored-by: Davide Marchegiani <davide.marchegiani@gmail.com>

blimlim · 2026-03-11T01:34:12Z

Thanks @atteggiani for your detailed feedback, those are all very useful points and questions. I'll try to respond to them here!

What do you mean by "embedded in rose/cylc suites"? There is no currently no um2nc driver cm2 command, so how is CM2 driver currently embedded in the rose/cylc suite?

That's correct that there's no drivers for CM2/CM3 in um2nc. Instead there a separate python scripts within the cylc suites which perform essentially the same functions as a driver, i.e. finding inputs, setting output paths, and running the conversion:

I used the common steps between these scripts and the ESM1.5 driver to help think of a general structure.

When you mention "inputs" and "outputs" files, directories, etc., are they referring to the inputs and outputs of um2nc, or the models?

Good point, I agree "inputs" and "outputs" aren't being used consistently right now which is a bit confusing. Your suggestion to consistently use "inputs" for inputs to um2nc, and outputs to refer to output netCDF files makes sense. I've tried to make this consistent in 6e7d5a6, and have also used "model history directory" in the docstrings to refer to the directories created by the model – model output files are often called "history files", and so this might be clearer than the previous use of "model output directories".

I think finding the directory containing the model outputs (which contain some of the um2nc conversion targets above) is a step included within my number 1 above

This makes sense to me, I've updated the proposed implementation in 6e7d5a6 so that finding the input directory is included within step 1.

number 4 is part of the post-processing steps, which can be included within the conversion step (it is also partly included in getting the output dir).

That makes sense to me, I agree it doesn't need to be considered as a core step. In the implementation I think it should be set up as a one of the general driver methods which can be redefined for specific drivers, which will allow each model to have a different naming scheme.

atteggiani · 2026-03-11T02:02:47Z

Instead there a separate python scripts within the cylc suites which perform essentially the same functions as a driver

I see. I suppose that the next step will be to convert these scripts into actual um2nc driver ... comands then, right?

blimlim · 2026-03-11T04:03:35Z

I suppose that the next step will be to convert these scripts into actual um2nc driver ... comands then, right?

Yeah definitely! For CM3, I believe we'd like to stop using the old um2netcdf script and add a driver here. For CM2 it should be viable to add a driver, but I suspect it might not be a priority

atteggiani

I left a few more comments.

src/um2nc/drivers/common.py

blimlim · 2026-03-18T23:29:20Z

Thanks for the ideas and discussion @atteggiani! I've marked this one ready for review, let me know if you have any other ideas or suggestions!

atteggiani

Hi @blimlim,

Thank you for the updates.

I went through the whole architectural implementation and I only made some comments on the ModelDriver class (because those would influence the rest of the code).

Please have a look at my suggestion, and feel free to ask for any clarification on it.
We might want to have a chat as I feel it would be easier to discuss these design choices.

.github/workflows/CI.yml

src/um2nc/drivers/common.py

Co-authored-by: Davide Marchegiani <davide.marchegiani@gmail.com>

src/um2nc/drivers/common.py

test/drivers/test_drivers_common.py

test/drivers/test_drivers_esm1p5.py

blimlim · 2026-03-23T03:52:17Z

Thanks @atteggiani for the suggestions! I've had a go at implementing the suggested structure, and have added some comments around some of the choices made. Let me know if you have any further thoughts or suggestions!

atteggiani

Thank you @spencer for the updates.

I added a few comments mainly on the common parts (because they affect the rest).

Feel free to start a discussion below any comments :)

src/um2nc/drivers/common.py

src/um2nc/drivers/esm1p5.py

blimlim · 2026-03-29T22:57:38Z

Thanks again @atteggiani for the helpful suggestions and feedback :) I've had a go adding them in, and if you get a chance to look at the latest changes that would be great!

atteggiani

Thank you @blimlim for addressing my review!

The updates overall look pretty good! I like the tests you implemented.

I left a few more suggestions and I noticed the specific model drivers are missing tests.

Please refer to the specific comments for details

src/um2nc/drivers/esm1p5.py

src/um2nc/drivers/esm1p6.py

test/drivers/test_drivers_common.py

test/drivers/test_drivers_esm1p5.py

test/drivers/test_drivers_esm1p6.py

atteggiani · 2026-03-30T04:34:15Z

test/drivers/test_drivers_esm1p5.py

There are no tests for the Esm1p5Driver

The differences between the Esm1p5Driver and the base ModelDriver are essentially the setting of the atmosphere_dir, output_dir, and unit_suffixes properties, and the specific implementations of the get_input_paths and get_output_path methods.

The get_input_paths method only calls the find_esm1px_fields_files function, which is could probably do with an additional test added.

The get_output_path method almost directly calls the get_nc_filename function which has its own tests

I'm thinking one option would be to remove these functions and implement them directly in get_input_paths and get_output_path methods, and then test them through tests of the drivers. I don't expect these functions to be required outside of the drivers and so it could be a bit cleaner. Would this be preferable?

We should still test each of those properties.

For the methods, if they call a function that needs is reused by multiple drivers (like in this case), I would still keep the function in the common.py and test it in test_common.py, but still test each specific method implemented. Even if the test would only be testing that the function gets actually called (depending on the specific method).
The same exact concept applies for Esm1p6Driver.

atteggiani · 2026-03-30T04:34:33Z

test/drivers/test_drivers_esm1p6.py

+    Check that netCDF file naming produces expected file paths for various
+    expected unit keys.
+    """
+    nc_name = esm1p5_convert.get_nc_filename(


Suggested change

nc_name = esm1p5_convert.get_nc_filename(

nc_name = esm1p6.get_nc_filename(

test/drivers/test_drivers_esm1p6.py

atteggiani · 2026-03-30T04:35:14Z

test/drivers/test_drivers_esm1p6.py

There are no tests for the Esm1p6Driver

Prototype ModelDriver class and esm1.6 driver

c3c6804

blimlim requested review from aidanheerdegen and atteggiani March 4, 2026 01:02

blimlim marked this pull request as draft March 4, 2026 01:02

aidanheerdegen reviewed Mar 4, 2026

View reviewed changes

src/um2nc/drivers/esm1p5.py Show resolved Hide resolved

aidanheerdegen reviewed Mar 4, 2026

View reviewed changes

src/um2nc/drivers/esm1p5.py Outdated Show resolved Hide resolved

aidanheerdegen reviewed Mar 4, 2026

View reviewed changes

src/um2nc/drivers/esm1p6.py Outdated Show resolved Hide resolved

blimlim and others added 2 commits March 4, 2026 15:31

Update frequency suffixes to match the schema

b999a24

Co-authored-by: Aidan Heerdegen <aidan.heerdegen@anu.edu.au>

Use correct frequencies in tests

92fcdff

blimlim mentioned this pull request Mar 5, 2026

Test addmeta ACCESS-NRI/access-esm1.6-configs#523

Draft

20 tasks

atteggiani reviewed Mar 5, 2026

View reviewed changes

blimlim and others added 2 commits March 6, 2026 13:30

Add entrypoint test of driver command

4f4d527

Co-authored-by: Davide Marchegiani <davide.marchegiani@gmail.com>

Changes from review suggestions

6e7d5a6

atteggiani reviewed Mar 12, 2026

View reviewed changes

src/um2nc/drivers/common.py Outdated Show resolved Hide resolved

src/um2nc/drivers/common.py Outdated Show resolved Hide resolved

blimlim added 3 commits March 13, 2026 12:29

Use abstract base class for model drivers

8117524

Reorganise setting of input and output paths

0db71b0

Syntax error in CI

465ca29

blimlim marked this pull request as ready for review March 17, 2026 22:05

atteggiani requested changes Mar 19, 2026

View reviewed changes

blimlim and others added 2 commits March 23, 2026 14:33

Changes based on feedback

2f71486

Update .github/workflows/CI.yml

be1b470

Co-authored-by: Davide Marchegiani <davide.marchegiani@gmail.com>

blimlim commented Mar 23, 2026

View reviewed changes

src/um2nc/drivers/common.py Outdated Show resolved Hide resolved

blimlim commented Mar 23, 2026

View reviewed changes

src/um2nc/drivers/common.py Outdated Show resolved Hide resolved

blimlim commented Mar 23, 2026

View reviewed changes

test/drivers/test_drivers_common.py Show resolved Hide resolved

blimlim commented Mar 23, 2026

View reviewed changes

test/drivers/test_drivers_esm1p5.py Outdated Show resolved Hide resolved

blimlim requested a review from atteggiani March 23, 2026 03:52

Syntax error for CI

f369d78

atteggiani requested changes Mar 26, 2026

View reviewed changes

Changes based on review suggestions

6ec9e12

blimlim requested a review from atteggiani March 29, 2026 22:57

atteggiani requested changes Mar 30, 2026

View reviewed changes

blimlim added 2 commits March 31, 2026 12:11

Suggestions from review

9d45124

Additional review changes

7b7e69c

	def convert_esm1p5_output_dir(esm1p5_output_dir, process_args):
	"""
	Driver function for converting ESM1.5 atmospheric outputs during a simulation.

	Parameters
	----------
	esm1p5_output_dir: an "outputXYZ" directory produced by an ESM1.5 simulation.
	Fields files in the "atmosphere" subdirectory will be
	converted to netCDF.
	process_args: argparse Namespace object carrying processing arguments.

	Returns
	-------
	succeeded: list of tuples of (input, output) filepaths for successful
	conversions.
	failed: list of tuples of form (filepath, exception) for files which failed
	to convert due to an allowed exception.
	"""

	esm1p5_output_dir = Path(esm1p5_output_dir)

	current_atm_output_dir = esm1p5_output_dir / "atmosphere"

	if not current_atm_output_dir.exists():
	raise FileNotFoundError(
	errno.ENOENT, os.strerror(errno.ENOENT), current_atm_output_dir
	)

	# Create a directory for writing netCDF files
	nc_write_dir = current_atm_output_dir / "netCDF"
	nc_write_dir.mkdir(exist_ok=True)

	# Find fields file outputs to be converted
	xhist_nml = f90nml.read(current_atm_output_dir / "xhist")
	run_id = xhist_nml["nlchisto"]["run_id"]
	fields_file_name_pattern = get_fields_file_pattern(run_id)

	atm_dir_contents = current_atm_output_dir.glob("*")

	atm_dir_fields_files = find_matching_files(
	atm_dir_contents, fields_file_name_pattern
	)

	if len(atm_dir_fields_files) == 0:
	warnings.warn(
	f"No files matching pattern '{fields_file_name_pattern}' "
	f"found in {current_atm_output_dir.resolve()}. No files will be "
	"converted to netCDF."
	)

	return [], [] # Don't try to run the conversion

	output_paths = [get_nc_write_path(path, nc_write_dir, get_ff_date(path)) for path in atm_dir_fields_files]
	input_output_pairs = zip(atm_dir_fields_files, output_paths)
	input_output_pairs = filter_name_collisions(input_output_pairs)

	succeeded, failed = convert_fields_file_list(input_output_pairs, process_args)

	if process_args.delete_ff:
	# Remove files that appear only as successful conversions
	for path in safe_removal(succeeded, failed):
	os.remove(path)

	class ModelDriver:
	"""
	Generic model conversion driver class. Defines a general sequence of steps
	which are followed by the drivers.
	"""

	class Esm1p5Driver(ModelDriver):

	# Output file suffix for each type of unit.
	UNIT_SUFFIXES = {
	"a": "mon",
	"e": "dai",
	"i": "3hr",
	"j": "6hr",
	}

	def get_input_dir(self, parent_dir):
	"""
	Given a path to an experiment parent directory, return the atmosphere output directory

	nc_name = esm1p5_convert.get_nc_filename(
	nc_name = esm1p6.get_nc_filename(

Conversation

blimlim commented Mar 4, 2026

Background

Proposed implementation

Uh oh!

aidanheerdegen commented Mar 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atteggiani commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blimlim commented Mar 11, 2026

Uh oh!

atteggiani commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blimlim commented Mar 11, 2026

Uh oh!

atteggiani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

blimlim commented Mar 18, 2026

Uh oh!

atteggiani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blimlim commented Mar 23, 2026

Uh oh!

atteggiani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blimlim commented Mar 29, 2026

Uh oh!

atteggiani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atteggiani Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

blimlim Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

atteggiani Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

atteggiani commented Mar 5, 2026 •

edited

Loading

atteggiani commented Mar 11, 2026 •

edited

Loading

atteggiani Mar 31, 2026 •

edited

Loading