From f3487425196c3e03f5d60586c25a3703c2c618ad Mon Sep 17 00:00:00 2001 From: minghangli-uni <24727729+minghangli-uni@users.noreply.github.com> Date: Tue, 26 Aug 2025 08:16:49 +1000 Subject: [PATCH 1/2] Update README.md --- README.md | 119 ++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 80 insertions(+), 39 deletions(-) diff --git a/README.md b/README.md index bfc8e73..f69b060 100644 --- a/README.md +++ b/README.md @@ -1,57 +1,98 @@ -# General Repository Template - -A general template repository for default settings when creating new repositories. +# access-experiment-runner + +[![CI](https://github.com/ACCESS-NRI/access-experiment-runner/actions/workflows/ci.yml/badge.svg)](https://github.com/ACCESS-NRI/access-experiment-runner/actions/workflows/ci.yml) +[![CD](https://github.com/ACCESS-NRI/access-experiment-runner/actions/workflows/cd.yml/badge.svg)](https://github.com/ACCESS-NRI/access-experiment-runner/actions/workflows/cd.yml) +[![Coverage Status](https://codecov.io/gh/ACCESS-NRI/access-experiment-runner/branch/main/graph/badge.svg)](https://codecov.io/gh/ACCESS-NRI/access-experiment-runner) +[![License](https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square)](https://opensource.org/license/apache-2-0) +[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) + +## About +The main role of the ACCESS experiment runner is to manage and monitor experiment job runs on the supercomputing environment (e.g., `Gadi`), offering automatic handling of success, failure, and duplicate run scenarios. + +## Key features +Submits and tracks PBS jobs on Gadi; oversees job lifecycle from submission through completion. +- When a job completes within expected run times, the tool prints a confirmation and stops further submissions. +- If a job fails, users may choose to inspect the working directory to diagnose the root cause. The tool will detect the failure and pause further actions, giving the user control over whether to resubmit. +- Detects already running or queued jobs and avoids redundant submissions—quickly skips duplicates with a user notification. + +## Installation +### User setup +The `experiment-runner` is installed in the `payu-dev` conda environment, hence loading `payu/dev` would directly make experiment-runner available for use. +``` +module use /g/data/vk83/prerelease/modules && module load payu/dev +``` -This repository uses the Apache-2.0 license. `COPYRIGHT.txt` contains a current copyright statement which should be included at the top of all files. +Alternatively, create and activate a python virtual environment, then install via pip, +``` +python3 -m venv --system-site-packages +source /bin/activate -When creating a new repository you [can use this repository as a template](https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-repository-from-a-template), to automate the creation of the correct license and COPYRIGHT statement. +pip install experiment-runner +``` -## COPYRIGHT Header +### Development setup +For contributors and developers, setup a development environment, +``` +git clone https://github.com/ACCESS-NRI/access-experiment-runner.git +cd access-experiment-runner -Best practice suggests adding a copyright statement at the top of every source code file, or text file where it is possible to add a copyright statement without interfering with the purpose of the file. The reasoning is if a file is separated from the repository in which it resides then it may not be possible to ascertain it's licensing, which may hamper re-use. +# under a virtual environment +pip install -e . +``` -Making this as short and concise as possible reduces the overhead in including such a copyright statement. To that end using [SPDX identifiers](https://spdx.dev/ids/) is simple, efficient, portable and machine-readable. +## Usage +``` +experiment-runner -i --help -### Examples +usage: experiment-runner [-h] [-i INPUT_YAML_FILE] -An example, short, copyright statement is reproduced below, as it might appear in different coding languages. Copy and add to files as appropriate: +Manage ACCESS experiments using configurable YAML input. +If no YAML file is specified, the tool will look for 'Experiment_runner.yaml' in the current directory. +If that file is missing, you must specify one with -i / --input-yaml-file. -#### plaintext -It is common to include copyright statements at the bottom of a text document or website page -```text -© 2022 ACCESS-NRI and contributors. See the top-level COPYRIGHT file for details. -SPDX-License-Identifier: Apache-2.0 +options: + -h, --help show this help message and exit + -i INPUT_YAML_FILE, --input-yaml-file INPUT_YAML_FILE + Path to the YAML file specifying parameter values for experiment runs. + Defaults to 'Experiment_runner.yaml' if present in the current directory. ``` -#### python -For code it is more common to include the copyright in a comment at the top -```python -# Copyright 2022 ACCESS-NRI and contributors. See the top-level COPYRIGHT file for details. -# SPDX-License-Identifier: Apache-2.0 -``` +One YAML example is provided in `example/Experiment_runner_example.yaml` -#### shell -```bash -# Copyright 2022 ACCESS-NRI and contributors. See the top-level COPYRIGHT file for details. -# SPDX-License-Identifier: Apache-2.0 +```yaml +test_path: /g/data/{PROJECT}/{USER}/prototype-0.1.0 +repository_directory: 1deg_jra55_ryf +running_branches: [ctrl, perturb_1, perturb_2] +keep_uuid: True +nruns: [1,1,1] ``` +where, -##### FORTRAN -```fortran -! Copyright 2022 ACCESS-NRI and contributors. See the top-level COPYRIGHT file for details. -! SPDX-License-Identifier: Apache-2.0 -``` +`test_path`: The base path to the experiment repository on the filesystem. In this case, it points to a prototype experiment runner checkout. -#### C/C++ -```c -// Copyright 2022 ACCESS-NRI and contributors. See the top-level COPYRIGHT file for details. -// SPDX-License-Identifier: Apache-2.0 -``` +`repository_directory`: The specific experiment configuration directory inside test_path. Here it is the `1deg_jra55_ryf` setup. -### Notes +`running_branches`: A list of git branches representing experiments to run. -Note that the date is the first time the project is created. +`keep_uuid`: Preserve unique identifiers (UUIDs) across runs. -The date signifies the year from which the copyright notice applies. **NEVER** replace with a later year, only ever add later years or a year range. +`nruns`: A list indicating how many runs to perform for each branch listed in running_branches. -It is not necessary to include subsequent years in the copyright statement at all unless updates have been made at a later time, and even then it is largely discretionary: they are not necessary as copyright is contingent on the lifespan of copyright holder +50 years as per the [Berne Convention](https://en.wikipedia.org/wiki/Berne_Convention). +## Workflow example +1. Trigger the experiment +``` +experiment-runner -i example/Experiment_runner_example.yaml +``` +2. The tool then checks status: +- Completed: +``` +... already completed " {doneruns}, hence no new runs. +``` +- Failed: +``` +Clean up a failed job {work_dir} and prepare it for resubmission. +``` +- Running/Queued: +``` +You have duplicated runs for in the same folder hence not submitting this job! +``` \ No newline at end of file From 4dd7ed6a73249849c8fb75ab7afe720439c7d2a6 Mon Sep 17 00:00:00 2001 From: minghangli-uni <24727729+minghangli-uni@users.noreply.github.com> Date: Tue, 26 Aug 2025 08:53:08 +1000 Subject: [PATCH 2/2] Apply code review - adding payu --- README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index f69b060..ceff1c0 100644 --- a/README.md +++ b/README.md @@ -7,13 +7,15 @@ [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) ## About -The main role of the ACCESS experiment runner is to manage and monitor experiment job runs on the supercomputing environment (e.g., `Gadi`), offering automatic handling of success, failure, and duplicate run scenarios. +The main role of the ACCESS experiment runner is to manage and monitor experiment job runs on the supercomputing environment (e.g., `Gadi`). It builds on `Payu`, handling the orchestration of multiple configuration branches, experiment setup, and job lifecycle. ## Key features -Submits and tracks PBS jobs on Gadi; oversees job lifecycle from submission through completion. -- When a job completes within expected run times, the tool prints a confirmation and stops further submissions. -- If a job fails, users may choose to inspect the working directory to diagnose the root cause. The tool will detect the failure and pause further actions, giving the user control over whether to resubmit. -- Detects already running or queued jobs and avoids redundant submissions—quickly skips duplicates with a user notification. +- Leverages `Payu` and run multiple experiments from different configuration branches. + +- Submits and tracks PBS jobs on `Gadi`; oversees job lifecycle from submission through completion. + - When a job completes within expected run times, the tool prints a confirmation and stops further submissions. + - If a job fails, users may choose to inspect the working directory to diagnose the root cause. The tool will detect the failure and pause further actions, giving the user control over whether to resubmit. + - Detects already running or queued jobs and avoids redundant submissions—quickly skips duplicates with a user notification. ## Installation ### User setup