E3SM-Project · chengzhuzhang · Jan 30, 2026 · Feb 6, 2026 · Feb 7, 2026 · Feb 9, 2026
diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
@@ -0,0 +1,5 @@
+<!-- markdownlint-disable -->
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+@../AGENTS.md
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,110 @@
+<!-- markdownlint-disable -->
+# AGENTS.md
+
+
+This file provides guidance to work with code in this repository.
+
+## Project Overview
+
+zppy_interfaces is a Python package providing extra functionality for E3SM climate model analysis. It processes log files, generates time series plots, and runs PCMDI diagnostics. The package is designed to be called by zppy or used standalone for climate model analysis.
+
+## Development Environment Setup
+
+Set up the development environment using conda:
+```bash
+conda clean --all --y
+conda env create -f conda/dev.yml -n zppy-interfaces-dev
+conda activate zppy-interfaces-dev
+pip install .
+pre-commit install
+```
+
+## Common Development Commands
+
+### Installation and Setup
+- `pip install .` - Install package in development mode
+- `pip install -e .[testing]` - Install with testing dependencies
+- `pip install -e .[qa]` - Install with quality assurance tools
+
+### Code Quality and Testing
+- `pytest` - Run all tests
+- `pytest tests/unit/budget_analysis/` - Run specific module tests
+- `pytest tests/unit/budget_analysis/test_atm_parser.py` - Run single test file
+- `pre-commit run --all-files` - Run all pre-commit hooks
+- `black .` - Format code with Black
+- `isort .` - Sort imports
+- `flake8` - Check code style
+- `mypy zppy_interfaces/` - Type checking
+
+### CLI Tools Testing
+Test the main CLI applications:
+- `zi-budget-analysis --help` - Budget analysis tool
+- `zi-global-time-series --help` - Global time series plots
+- `zi-pcmdi-link-observation --help` - PCMDI observation linking
+- `zi-pcmdi-mean-climate --help` - PCMDI mean climate diagnostics
+- `zi-pcmdi-variability-modes --help` - PCMDI variability modes
+- `zi-pcmdi-enso --help` - PCMDI ENSO diagnostics
+- `zi-pcmdi-synthetic-plots --help` - PCMDI synthetic plots
+
+## Architecture Overview
+
+### Main Components
+
+**budget_analysis/** - E3SM water and energy budget analysis
+- `__main__.py` - CLI entry point with legacy and whole-model modes
+- `parser.py` - Core budget parsing logic for coupler logs
+- `ingestion/` - Component-specific log parsers (atm, ocn, ice, lnd, cpl)
+- `plotting.py` - HTML plot generation for legacy mode
+- `viz.py` - Visualization for whole-model mode
+- `checks.py` - Budget conservation checks
+- `normalization.py` - Data normalization utilities
+
+**global_time_series/** - Global time series plot generation
+- `__main__.py` - CLI with viewer vs PDF output modes
+- `coupled_global/` - Core time series generation logic
+- `create_ocean_ts.py` - Ocean-specific time series processing
+- `utils.py` - Parameter handling utilities
+
+**pcmdi_diags/** - PCMDI diagnostics suite
+- Multiple CLI tools for different diagnostic types
+- `viewer.py` - HTML viewer generation
+- `synthetic_plots/` - Synthetic plot utilities
+
+**multi_utils/** - Shared utilities
+- `logger.py` - Logging setup for child processes
+- `viewer.py` - Common viewer functionality
+
+### Data Flow Patterns
+
+**Budget Analysis (whole-model mode):**
+1. Ingest: Parse multiple log file types (cpl, atm, ocn, ice, lnd)
+2. Normalize: Standardize data formats and units
+3. Check: Run conservation checks and compute residuals
+4. Visualize: Generate HTML reports with interactive plots
+
+**Global Time Series:**
+1. Ocean processing: Extract time series from MPAS-Analysis results (optional)
+2. Coupled analysis: Generate regional and global plots
+3. Output: HTML viewer (interactive) or PDF (static) based on make_viewer setting
+
+### Key Configuration Files
+
+- `pyproject.toml` - Package configuration, dependencies, CLI entry points
+- `conda/dev.yml` - Development environment specification
+- `.pre-commit-config.yaml` - Code quality hooks (black, isort, flake8, mypy)
+- `.flake8` - Flake8 configuration (line length 119, specific ignores)
+
+### Testing Strategy
+
+- Unit tests in `tests/unit/` organized by module
+- Example scripts in `examples/` showing realistic usage
+- Integration with pre-commit hooks for quality assurance
+- pytest with coverage reporting capabilities
+
+## Important Notes
+
+- The package handles both compressed (.gz) and uncompressed log files
+- Budget analysis supports both "legacy" (coupler-only) and "whole-model" modes
+- Time series generation can produce either interactive HTML viewers or static PDFs
+- All CLI tools use argparse with comprehensive help documentation
+- The codebase follows strict code quality standards with Black formatting and comprehensive linting
diff --git a/pyproject.toml b/pyproject.toml
@@ -19,10 +19,12 @@ classifiers = [
 
 dependencies = [
     "beautifulsoup4",
+    "bokeh",
     "lxml",
     "matplotlib",
     "netcdf4",
     "numpy >=2.0,<3.0",
+    "pandas",
     "pcmdi_metrics>=3.9.3",
     "xarray >=2023.02.0",
     "xcdat >=0.7.3,<1.0",
@@ -117,6 +119,7 @@ version = { attr = "zppy_interfaces.version.__version__" }
 
 # evolution of options.entry-points
 [project.scripts]
+zi-budget-analysis = "zppy_interfaces.budget_analysis.__main__:main"
 zi-global-time-series = "zppy_interfaces.global_time_series.__main__:main"
 zi-pcmdi-link-observation = "zppy_interfaces.pcmdi_diags.link_observation:main"
 zi-pcmdi-mean-climate = "zppy_interfaces.pcmdi_diags.pcmdi_mean_cimate:main"

diff --git a/tests/unit/budget_analysis/diagnose_ocn_closure.py b/tests/unit/budget_analysis/diagnose_ocn_closure.py
@@ -0,0 +1,96 @@
+"""Diagnose ocean closure: verify log-native term names are used correctly."""
+
+import glob
+import sys
+
+sys.path.insert(0, ".")
+
+from zppy_interfaces.budget_analysis.checks import OcnClosure  # noqa: E402
+from zppy_interfaces.budget_analysis.ingestion.ocn_parser import OcnParser  # noqa: E402
+from zppy_interfaces.budget_analysis.normalization import normalize  # noqa: E402
+
+LOG_PATH = "/pscratch/sd/c/chengzhu/zstash/archive/logs"
+START_YEAR = 1
+END_YEAR = 50
+
+ocn = OcnParser().parse_files(
+    sorted(glob.glob(f"{LOG_PATH}/ocn.log.*.gz")), START_YEAR, END_YEAR
+)
+
+print(f"Total ocean rows: {len(ocn)}")
+print()
+
+# --- Water ---
+water = ocn[ocn["quantity"] == "water"]
+print("=== Water term counts ===")
+print(water["term"].value_counts().to_string())
+print()
+
+mc = water[water["term"] == "Mass change"]
+print(f"=== 'Mass change' rows: {len(mc)} ===")
+if not mc.empty:
+    print(
+        mc[["time", "value", "units", "table_type", "period"]]
+        .head(12)
+        .to_string(index=False)
+    )
+else:
+    print("  *** NOT FOUND ***")
+print()
+
+svf = water[water["term"] == "SUM VOLUME FLUXES"]
+print(f"=== 'SUM VOLUME FLUXES' rows: {len(svf)} ===")
+if not svf.empty:
+    print(
+        svf[["time", "value", "units", "table_type", "period"]]
+        .head(12)
+        .to_string(index=False)
+    )
+else:
+    print("  *** NOT FOUND ***")
+print()
+
+# --- Heat ---
+heat = ocn[ocn["quantity"] == "heat"]
+print("=== Heat term counts ===")
+print(heat["term"].value_counts().to_string())
+print()
+
+ec = heat[heat["term"] == "Energy change"]
+print(f"=== 'Energy change' rows: {len(ec)} ===")
+if not ec.empty:
+    print(
+        ec[["time", "value", "units", "table_type", "period"]]
+        .head(12)
+        .to_string(index=False)
+    )
+else:
+    print("  *** NOT FOUND ***")
+print()
+
+shf = heat[heat["term"] == "SUM IMP+EXP HEAT FLUXES"]
+print(f"=== 'SUM IMP+EXP HEAT FLUXES' rows: {len(shf)} ===")
+if not shf.empty:
+    print(
+        shf[["time", "value", "units", "table_type", "period"]]
+        .head(12)
+        .to_string(index=False)
+    )
+else:
+    print("  *** NOT FOUND ***")
+print()
+
+# --- Test closure checks ---
+print("=== Testing OcnClosure checks ===")
+df = normalize(ocn)
+
+for q in ["water", "heat"]:
+    check = OcnClosure(quantity=q)
+    result = check.evaluate(df)
+    if result is not None:
+        print(
+            f"  {q} closure: {len(result.years)} years,"
+            f" max |residual| = {abs(result.residual).max():.2e}"
+        )
+    else:
+        print(f"  {q} closure: SKIPPED (missing data)")
diff --git a/tests/unit/budget_analysis/print_monthly_interface.py b/tests/unit/budget_analysis/print_monthly_interface.py
@@ -0,0 +1,43 @@
+"""Print monthly normalized values for cpl and ocn interface comparison."""
+
+import glob
+import sys
+
+import pandas as pd
+
+sys.path.insert(0, ".")
+
+from zppy_interfaces.budget_analysis.ingestion.cpl_parser import CplParser  # noqa: E402
+from zppy_interfaces.budget_analysis.ingestion.ocn_parser import OcnParser  # noqa: E402
+from zppy_interfaces.budget_analysis.normalization import normalize  # noqa: E402
+
+LOG_PATH = "/pscratch/sd/c/chengzhu/zstash/archive/logs"
+START_YEAR = 1
+END_YEAR = 50
+
+cpl = CplParser().parse_files(
+    sorted(glob.glob(f"{LOG_PATH}/cpl.log.*.gz")), START_YEAR, END_YEAR
+)
+ocn = OcnParser().parse_files(
+    sorted(glob.glob(f"{LOG_PATH}/ocn.log.*.gz")), START_YEAR, END_YEAR
+)
+df = normalize(pd.concat([cpl, ocn], ignore_index=True))
+
+cpl_m = df[
+    (df.source == "cpl")
+    & (df.term == "*SUM*")
+    & (df.component == "ocn")
+    & (df.period == "monthly")
+].sort_values("time")
+
+ocn_m = df[
+    (df.source == "ocn")
+    & (df.term == "SUM VOLUME FLUXES")
+    & (df.table_type == "flux")
+    & (df.period == "monthly")
+].sort_values("time")
+
+print("=== CPL monthly ocn *SUM* ===")
+print(cpl_m[["time", "normalized_value"]].to_string(index=False))
+print(f"\n=== OCN monthly SUM VOLUME FLUXES ({len(ocn_m)} rows) ===")
+print(ocn_m[["time", "normalized_value"]].to_string(index=False))
diff --git a/tests/unit/budget_analysis/test_atm_parser.py b/tests/unit/budget_analysis/test_atm_parser.py
@@ -0,0 +1,84 @@
+"""Quick test script for AtmParser on a sample atm.log file."""
+
+import sys
+
+from zppy_interfaces.budget_analysis.ingestion.atm_parser import AtmParser
+
+LOG_FILE = (
+    "/pscratch/sd/e/e3smtest/e3sm_scratch/pm-cpu/"
+    "SMS.ne4pg2_oQU480.F2010.pm-cpu_intel.eam-thetahy_ftype2_energy"
+    ".C.JNextIntegration20260210_205258/run/"
+    "atm.log.48753126.260210-224733.gz"
+)
+
+
+def main():
+    parser = AtmParser()
+    log_files = [LOG_FILE]
+
+    # --- Raw per-step data ---
+    nstep_te, flux_diag = parser.parse_raw(log_files)
+
+    print("=== nstep_te (energy fixer) ===")
+    print(f"Shape: {nstep_te.shape}")
+    print(f"Columns: {list(nstep_te.columns)}")
+    print(nstep_te.head(3))
+    print("...")
+    print(nstep_te.tail(3))
+    print()
+
+    print("=== flux_diag (water/energy diagnostics) ===")
+    print(f"Shape: {flux_diag.shape}")
+    print(f"Columns: {list(flux_diag.columns)}")
+    print(flux_diag.head(3))
+    print("...")
+    print(flux_diag.tail(3))
+    print()
+
+    # --- Validation ---
+    print("=== Validation ===")
+    assert (
+        nstep_te.shape[0] == 122
+    ), f"Expected 122 nstep_te rows, got {nstep_te.shape[0]}"
+    print(f"  nstep_te rows: {nstep_te.shape[0]} (expected 122)")
+
+    assert (
+        flux_diag.shape[0] == 120
+    ), f"Expected 120 flux_diag rows, got {flux_diag.shape[0]}"
+    print(f"  flux_diag rows: {flux_diag.shape[0]} (expected 120)")
+
+    tw_first = flux_diag["tw"].iloc[0]
+    tw_last = flux_diag["tw"].iloc[-1]
+    print(f"  W(n=1)  = {tw_first:.6f} kg/m2 (expect ~25.303)")
+    print(f"  W(n=120)= {tw_last:.6f} kg/m2 (expect ~24.949)")
+
+    e_diff_max = flux_diag["e_diff"].abs().max()
+    print(f"  max |E difference| = {e_diff_max:.3e} W/m2")
+
+    # Check date tracking
+    print(
+        f"  Date range: year {nstep_te['year'].min()}-{nstep_te['year'].max()}, "
+        f"month {nstep_te['month'].min()}-{nstep_te['month'].max()}, "
+        f"day {nstep_te['day'].min()}-{nstep_te['day'].max()}"
+    )
+    print()
+
+    # --- Tidy event table (annual) ---
+    events = parser.parse_files(log_files, 1, 1)
+    print("=== Tidy event table (annual, default) ===")
+    print(f"Shape: {events.shape}")
+    print(events.to_string())
+    print()
+
+    # --- Tidy event table (monthly) ---
+    monthly_parser = AtmParser(frequency="monthly")
+    events_m = monthly_parser.parse_files(log_files, 1, 1)
+    print("=== Tidy event table (monthly) ===")
+    print(f"Shape: {events_m.shape}")
+    print(events_m.to_string())
+
+    print("\nAll checks passed.")
+
+
+if __name__ == "__main__":
+    sys.exit(main() or 0)