Skip to content

[Feature]: Print absolute test_data_path and ref_data_path in console for reproducibility #1030

@tomvothecoder

Description

@tomvothecoder

Is your feature request related to a problem?

zppy generates e3sm_diags scripts that rely on symlinks and local paths for input data. In particular, param.test_data_path is often set to a local directory (e.g., climo) without indicating how it maps to the actual data location.

This makes reproducing e3sm_diags issues difficult outside the original environment. To run the generated scripts elsewhere, we must either reverse-engineer the symlinks and either:

  1. Update test_data_path to the correct absolute paths, or
  2. Copy the files located at the symlinks to the local directory referenced in the script (e.g., climo)

Also we have to consider that e3sm_diags won't know which sets of climo files to use. zppy symlinks to the exact files needed through the start_yr and end_yr configs.

Example

In this zppy-generated script:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/zppy-diags-1019-xc-break-test4/v2.LR.historical_0201/e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1982-1983/prov/e3sm.py

# Model
param.test_data_path = 'climo'
param.test_name = 'v2.LR.historical_0201'
param.short_test_name = short_name

The script cannot be run as-is without reconstructing the symlinks or copying the data locally.

Describe the solution you'd like

A simple way to improve reproducibility is to print the resolved test_data_path and reference_data_path configs used by all parameter objects in the console output (and therefore the log file). This would make the exact data configuration visible without needing to inspect generated scripts or reverse-engineer symlinks.

For example, extending the run header to include these paths:

E3SM Diagnostics Run
--------------------
Timestamp: 2025-12-19 11:50:03
Version Info: version v3.1.0
Results Path: model_vs_obs_1982-1983
Log Path: model_vs_obs_1982-1983/prov/e3sm_diags_run.log
Parameter Files Path: model_vs_obs_1982-1983/prov/cmd_used.txt
Python Script Path: model_vs_obs_1982-1983/prov/e3sm.py
Environment YML Path: model_vs_obs_1982-1983/prov/environment.yml
Provenance Index HTML Path: model_vs_obs_1982-1983/prov/index.html
Test Data Path: /absolute/path/to/test/data
Reference Data Path: /absolute/path/to/reference/data

This information would be preserved in the log file, making it much easier to reproduce and debug runs.

Describe alternatives you've considered

No response

Additional context

Related to:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions