Conversation
| export CDMS_NO_MPI=true | ||
| cd ${case_dir}/post/atm/glb/ts/monthly/${ts_num_years}yr | ||
| cdscan -x glb.xml *.nc | ||
| python cdscan_replacement.py glb.xml *.nc |
There was a problem hiding this comment.
zppy isn't finding the cdscan_replacement.py file.
global_time_series adds Python scripts to an accessible directory via the block at https://github.com/E3SM-Project/zppy/blob/main/zppy/global_time_series.py#L55:
c["global_time_series_dir"] = os.path.join(
scriptDir, "{}_dir".format(prefix)
)
if not os.path.exists(c["global_time_series_dir"]):
os.mkdir(c["global_time_series_dir"])
scripts = ["coupled_global.py", "readTS.py", "ocean_month.py"]
for script in scripts:
script_template = templateEnv.get_template(script)
script_file = os.path.join(c["global_time_series_dir"], script)
with open(script_file, "w") as f:
f.write(script_template.render(**c))
makeExecutable(script_file)
Would it make sense to do something similar for cdscan_replacement? It seems like there should be a simpler way since there are no template parameters in that file.
There was a problem hiding this comment.
@forsyth2, why not make this a proper entry point in the zppy package? That's the "right" way to do this:
https://packaging.python.org/en/latest/specifications/entry-points/
There was a problem hiding this comment.
To do that, you would add your functions along with a main() function equivalent to your current if __name__ == "__main__": block to zppy somewhere and then you would edit your setup.py to add another entry point like this one:
https://github.com/E3SM-Project/zppy/blob/main/setup.py#L33
There was a problem hiding this comment.
@xylar Thanks, I tried implementing it that way, but I'm getting zppy_cdscan_replacement: command not found errors.
$ grep -n "cdscan_replacement" *.o*
e3sm_diags_atm_monthly_180x360_aave_environment_commands_model_vs_obs_1850-1851.o405614:1:/var/spool/slurmd/job405614/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_environment_commands_model_vs_obs_1850-1853.o405616:1:/var/spool/slurmd/job405616/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_environment_commands_model_vs_obs_1852-1853.o405615:1:/var/spool/slurmd/job405615/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1850-1851.o405611:1:/var/spool/slurmd/job405611/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1850-1853.o405613:1:/var/spool/slurmd/job405613/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.o405612:1:/var/spool/slurmd/job405612/slurm_script: line 120: zppy_cdscan_replacement: command not found
global_time_series_1850-1860.o405621:2:/var/spool/slurmd/job405621/slurm_script: line 58: zppy_cdscan_replacement: command not found
Yet the command does show up when I run which in the dev environment.
$ which zppy
~/miniconda3/envs/zppy_dev_issue_346/bin/zppy
$ which zppy_cdscan_replacement
~/miniconda3/envs/zppy_dev_issue_346/bin/zppy_cdscan_replacement
There was a problem hiding this comment.
Are you using a different environment (e.g. E3SM-Unified) when you try to call zppy_cdscan_replacement? If that command is part of the environment you use to launch jobs but then it's not available in the environment that's actually used when you run the jobs, it would make sense that you're getting these errors.
There was a problem hiding this comment.
Hmm I don't think that worked but I can't seem to find any documentation as to why.
I do see on Oct. 3, Tom noted "I suggest exploring xarray.open_mfdataset()" in #346 (comment), but then in this thread from Oct. 5, I'm trying to use cdscan_replacement.py. That seems to imply something about open_mfdataset wasn't working properly.
There was a problem hiding this comment.
Oh, interesting! Still, I think it's worth trying again with open_mfdataset and reminding yourself what the error was. Maybe we can solve it.
There was a problem hiding this comment.
I'm looking into it; I want to say it was something about it needing to convert to xml.
It also occurs to me that even if we use xarray, we'd need to call it from a bash script, so the same general problem (of needing to get a bash wrapper defined in the environment) would remain.
There was a problem hiding this comment.
Can you explain the step-by-step requirements/description for the functionality involving cdscan? This would make it easier to identify the equivalent APIs needed from other packages to replace the cdscan code.
From my guess, in your comment here, it just looks like the code takes a glob of .nc filepaths and converts it to XML using cdscan. I assume this XML is used downstream with cdms2 (or another CDAT package) to open up the dataset.
$ git grep -n cdscan
docs/source/dev_guide/new_diags_set.rst:100: cdscan -x ${xml_name} ${rofDir}/${v}_*.nc
zppy/templates/e3sm_diags.bash:104: # Add this time series file to the list of files for cdscan to use
zppy/templates/e3sm_diags.bash:111: cdscan -x ${xml_name} -f ${v}_files.txt
zppy/templates/e3sm_diags.bash:132: cdscan -x ${xml_name} ${ts_rof_dir_source}/${v}_*.nc
zppy/templates/global_time_series.bash:46:cdscan -x glb.xml *.nc
zppy/templates/global_time_series.bash:70: cdscan -x glb.xml mpaso.glb*.nc
$ git grep -n cdms
zppy/templates/readTS.py:1:import cdms2
zppy/templates/readTS.py:10: self.f = cdms2.open(filename)
If that's the case, you can probably bypass the XML step entirely and just pass the glob paths (e.g., ${rofDir}/${v}_*.nc directly to xarray.open_mfdataset() to open up the dataset. However, you'll need to do some prototyping to validate a solution (I'm just throwing things out).
path = "${rofDir}/${v}_*.nc"
dataset = xarray.open_mfdataset(path)There was a problem hiding this comment.
Can you explain the step-by-step requirements/description for the functionality involving
cdscan?
For e3sm_diags: cdscan generates an xml file, placing it in the relevant directory path parameter for e3sm_diags to use.
The relevant code steps
cdscanis used to generate axmlfile based onncfiles corresponding to a particular variable between January of the start year and December of the end year.
$ git grep -n -B 2 cdscan e3sm_diags.bash
e3sm_diags.bash-102- for file in ${ts_dir_source}/${v}_${YYYY}*.nc
e3sm_diags.bash-103- do
e3sm_diags.bash:104: # Add this time series file to the list of files for cdscan to use
--
e3sm_diags.bash-109- xml_name=${v}_${begin_year}01_${end_year}12.xml
e3sm_diags.bash-110- export CDMS_NO_MPI=true
e3sm_diags.bash:111: zppy_cdscan_replacement ${xml_name} ${v}_files.txt
--
e3sm_diags.bash-130- v="RIVER_DISCHARGE_OVER_LAND_LIQ"
e3sm_diags.bash-131- xml_name=${v}_${begin_year}01_${end_year}12.xml
e3sm_diags.bash:132: zppy_cdscan_replacement ${xml_name} ${ts_rof_dir_source}/${v}_*.nc
- This is done inside the
create_links_tsfunction (and also thecreate_links_ts_roffunction):
$ git grep -n -A 3 "create_links_ts(" e3sm_diags.bash
e3sm_diags.bash:84:create_links_ts()
e3sm_diags.bash-85-{
e3sm_diags.bash-86- ts_dir_source=$1
e3sm_diags.bash-87- ts_dir_destination=$2
- These 2 functions have created the
xmlfiles in the passed-in destination directories:
$ git grep -n -B 1 "create_links_ts" e3sm_diags.bash
e3sm_diags.bash-83-
e3sm_diags.bash:84:create_links_ts()
--
e3sm_diags.bash-120-
e3sm_diags.bash:121:create_links_ts_rof()
--
e3sm_diags.bash-201-ts_dir_source={{ output }}/post/atm/{{ grid }}/ts/monthly/{{ '%dyr' % (ts_num_years) }}
e3sm_diags.bash:202:create_links_ts ${ts_dir_source} ${ts_dir_primary} ${Y1} ${Y2} 5
--
e3sm_diags.bash-205-ts_dir_ref=ts_ref
e3sm_diags.bash:206:create_links_ts ${ts_dir_source} ${ts_dir_ref} ${ref_Y1} ${ref_Y2} 6
--
e3sm_diags.bash-221-ts_rof_dir_source="{{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr"
e3sm_diags.bash:222:create_links_ts_rof ${ts_rof_dir_source} ${ts_rof_dir_primary} ${Y1} ${Y2} 7
--
e3sm_diags.bash-225-ts_rof_dir_ref=ts_rof_ref
e3sm_diags.bash:226:create_links_ts_rof ${ts_rof_dir_source} ${ts_rof_dir_ref} ${ref_Y1} ${ref_Y2} 8
- Following one of the destination directories, we can see that
ts_dir_primaryis used to definetest_tsin the generatede3sm_diagspython file.
$ git grep -n -B 1 ts_dir_primary e3sm_diags.bash
e3sm_diags.bash-195-{% if run_type == "model_vs_obs" %}
e3sm_diags.bash:196:ts_dir_primary=ts
e3sm_diags.bash-197-{% elif run_type == "model_vs_model" %}
e3sm_diags.bash:198:ts_dir_primary=ts_test
--
e3sm_diags.bash-201-ts_dir_source={{ output }}/post/atm/{{ grid }}/ts/monthly/{{ '%dyr' % (ts_num_years) }}
e3sm_diags.bash:202:create_links_ts ${ts_dir_source} ${ts_dir_primary} ${Y1} ${Y2} 5
--
e3sm_diags.bash-278-short_name = '${short}'
e3sm_diags.bash:279:test_ts = '${ts_dir_primary}'
- That
test_tsvalue is used to defined thetest_data_pathfor several parameters.
$ git grep -n test_ts e3sm_diags.bash
e3sm_diags.bash:279:test_ts = '${ts_dir_primary}'
e3sm_diags.bash:350:enso_param.test_data_path = test_ts
e3sm_diags.bash:406:qbo_param.test_data_path = test_ts
e3sm_diags.bash:436:ts_param.test_data_path = test_ts
For global_time_series: cdscan generates glb.xml files which are ultimately opened and processed to generate a global annual average for each variable contained.
The relevant code steps
cdscanis used to generate aglb.xmlfile based on all thencfiles in a particularglbsubdirectory, for multiple subdirectories:
$ git grep -n -B 1 cdscan global_time_series.bash
global_time_series.bash-44- cd ${case_dir}/post/atm/glb/ts/monthly/${ts_num_years}yr
global_time_series.bash:45: zppy_cdscan_replacement glb.xml *.nc
--
global_time_series.bash-56- cd ${case_dir}/post/lnd/glb/ts/monthly/${ts_num_years}yr
global_time_series.bash:57: zppy_cdscan_replacement -x glb.xml *.nc
--
global_time_series.bash-80- cd ${case_dir}/post/ocn/glb/ts/monthly/${ts_num_years}yr
global_time_series.bash:81: zppy_cdscan_replacement glb.xml mpaso.glb*.nc
- Which are then used to define values for keys in the
expdirectory:
$ git grep -n -B 2 "glb.xml" coupled_global.py
coupled_global.py-741- "atmos": None
coupled_global.py-742- if not use_atmos
coupled_global.py:743: else "{}/post/atm/glb/ts/monthly/{}yr/glb.xml".format(
--
coupled_global.py-746- "ice": None
coupled_global.py-747- if not plots_ice
coupled_global.py:748: else "{}/post/ice/glb/ts/monthly/{}yr/glb.xml".format(
--
coupled_global.py-751- "land": None
coupled_global.py-752- if not plots_lnd
coupled_global.py:753: else "{}/post/lnd/glb/ts/monthly/{}yr/glb.xml".format(
--
coupled_global.py-756- "ocean": None
coupled_global.py-757- if not use_ocn
coupled_global.py:758: else "{}/post/ocn/glb/ts/monthly/{}yr/glb.xml".format(
--
coupled_global.py-764- "vol": None
coupled_global.py-765- if not use_ocn
coupled_global.py:766: else "{}/post/ocn/glb/ts/monthly/{}yr/glb.xml".format(
- These
glb.xmlfiles are then passed intoset_var, implicitly asexp[exp_key]
$ git grep -n "set_var" coupled_global.py
coupled_global.py:552:def set_var(exp, exp_key, var_list, valid_vars, invalid_vars, rgn):
coupled_global.py:786: set_var(exp, "atmos", vars_original, valid_vars, invalid_vars, rgn)
coupled_global.py:787: set_var(exp, "atmos", plots_atm, valid_vars, invalid_vars, rgn)
coupled_global.py:788: set_var(exp, "ice", plots_ice, valid_vars, invalid_vars, rgn)
coupled_global.py:789: set_var(exp, "land", plots_lnd, valid_vars, invalid_vars, rgn)
coupled_global.py:790: set_var(exp, "ocean", plots_ocn, valid_vars, invalid_vars, rgn)
- They're then passed to the
TSclass constructor:
$ git grep -n "exp\[exp_key\]" coupled_global.py
coupled_global.py:553: if exp[exp_key] is not None:
coupled_global.py:554: ts = TS(exp[exp_key])
- The
glb.xmlfiles are ultimately opened, formerly bycdms2.openand with this PR byxcdat.open_dataset
$ git grep -n -A 5 "TS" readTS.py
readTS.py:4:class TS(object):
readTS.py-5- def __init__(self, filename):
readTS.py-6-
readTS.py-7- self.filename = filename
readTS.py-8-
readTS.py-9- self.f = xcdat.open_dataset(filename)
- For every variable in
glb.xml, theglobalAnnualis computed (the last 2 blocks have essentially just bypassed the function call of step 3):
git grep -n -B 3 "ts.globalAnnual" coupled_global.py
coupled_global.py-554- ts = TS(exp[exp_key])
coupled_global.py-555- for var in var_list:
coupled_global.py-556- try:
coupled_global.py:557: v, units = ts.globalAnnual(var)
--
coupled_global.py-792- # Optionally read ohc
coupled_global.py-793- if exp["ocean"] is not None:
coupled_global.py-794- ts = TS(exp["ocean"])
coupled_global.py:795: exp["annual"]["ohc"], _ = ts.globalAnnual("ohc")
--
coupled_global.py-798-
coupled_global.py-799- if exp["vol"] is not None:
coupled_global.py-800- ts = TS(exp["vol"])
coupled_global.py:801: exp["annual"]["volume"], _ = ts.globalAnnual("volume")
globalAnnualis computed formerly bycdutil.YEARand with this PR byself.f.temporal.group_average:
$ git grep -n group_average readTS.py
readTS.py:68: v = self.f.temporal.group_average(v, "year")
I assume this XML is used downstream with cdms2 (or another CDAT package) to open up the dataset.
Per the above, this is correct for global_time_series, but not for e3sm_diags (at least not directly. That is indeed probably the case within the e3sm_diags package -- but that is outside the scope of the zppy package).
you can probably bypass the XML step entirely and just pass the glob paths
Ok thanks, I will try experimenting with this.
|
Following the example of E3SM-Project/e3sm_diags#803: Running: Gives: And copied from #346 (comment): |
|
@tomvothecoder On the wider topic of any CDAT usage in |
zppy/templates/readTS.py
Outdated
| # Refactor note: `AttributeError: 'Dataset' object has no attribute 'temporal'` seems to always occur | ||
| # Regardless if using CDAT or not, if using as object or class method. | ||
| # v = self.f.temporal.group_average(v, "year") | ||
| v = xarray.Dataset.temporal.group_average(v, "year") |
There was a problem hiding this comment.
@tomvothecoder It does indeed seem like I'm able to eliminate the cdscan call to generate an xml + the self.f = cdms2.open(filename) line by using xarray.open_mfdataset(f"{directory}/*.nc")
However, no matter what I do, Python doesn't seem to think temporal exists. I'm looking at https://xcdat.readthedocs.io/en/latest/generated/xarray.Dataset.temporal.group_average.html, and this is running in the latest Unified environment. Am I using the wrong averaging function here? (This line replaces v = cdutil.YEAR(v))
There was a problem hiding this comment.
Make sure to import xcdat to gain access to the .temporal accessor class from xCDAT.
The xCDAT accessor classes extend top of xarray.Dataset objects.
There was a problem hiding this comment.
Oh I see, I just assumed it would be an xarray sub-module. Thanks for the info!
|
@tomvothecoder After much debugging/learning of
As a side note, it's a real shame we weren't using typing on all |
zppy/templates/coupled_global.py
Outdated
| import matplotlib.pyplot as plt | ||
| import numpy as np | ||
| import xarray | ||
| import xcdat # noqa: F401 |
I would hope that a similar approach would work in |
What I think If it is correct, then we need to refactor |
|
@tomvothecoder you are correct |
@xylar Yes, I believe that was the point of producing the
@tomvothecoder Yes, sorry I should have been clearer. By "
@tomvothecoder Interesting, yes that's what I meant by "(unless the Basically the two options I was getting at were:
So it sounds like we want (2) |
Option (1) will work for the existing CDAT-based You're correct, we want option (2). You can create a separate branch specifically to refactor the
|
@tomvothecoder Sure, that sounds good to me. Would it be better to merge the parts we can now though? I.e. |
Yes, I would split out the work to refactor |
10451af to
cd15860
Compare
Replace CDAT. Resolves #346. Resolves #80.