zppy-pcmdi run issues on NERC Perlmutter #781
Replies: 3 comments 12 replies
-
|
@zhangshixuan1987 I tried to look at post/scripts/ sub folder and it looks like you are trying something and creating new atm time-series files. I think we should be able to leverage the new logs and troubleshooting further. |
Beta Was this translation helpful? Give feedback.
-
Reviewing messages from Slackcd /pscratch/sd/z/zhan391/EAMxx/ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1/post/scripts
ls *.bash
# e3sm_to_cmip_atm_monthly_180x360_aave_1995-1995-0001.bash e3sm_to_cmip_atm_monthly_180x360_aave_2000-2000-0001.bash
# e3sm_to_cmip_atm_monthly_180x360_aave_1996-1996-0001.bash e3sm_to_cmip_atm_monthly_180x360_aave_2001-2001-0001.bash
# e3sm_to_cmip_atm_monthly_180x360_aave_1997-1997-0001.bash e3sm_to_cmip_atm_monthly_180x360_aave_2002-2002-0001.bash
# e3sm_to_cmip_atm_monthly_180x360_aave_1998-1998-0001.bash e3sm_to_cmip_atm_monthly_180x360_aave_2003-2003-0001.bash
# e3sm_to_cmip_atm_monthly_180x360_aave_1999-1999-0001.bash e3sm_to_cmip_atm_monthly_180x360_aave_2004-2004-0001.bash
ls *.status
# e3sm_to_cmip_atm_monthly_180x360_aave_1995-1995-0001.status ts_atm_monthly_180x360_aave_1995-2004.status
# e3sm_to_cmip_atm_monthly_180x360_aave_1996-1996-0001.status ts_atm_monthly_180x360_aave_1996-1996-0001.status
# e3sm_to_cmip_atm_monthly_180x360_aave_1997-1997-0001.status ts_atm_monthly_180x360_aave_1997-1997-0001.status
# e3sm_to_cmip_atm_monthly_180x360_aave_1998-1998-0001.status ts_atm_monthly_180x360_aave_1998-1998-0001.status
# e3sm_to_cmip_atm_monthly_180x360_aave_1999-1999-0001.status ts_atm_monthly_180x360_aave_1999-1999-0001.status
# e3sm_to_cmip_atm_monthly_180x360_aave_2000-2000-0001.status ts_atm_monthly_180x360_aave_2000-2000-0001.status
# e3sm_to_cmip_atm_monthly_180x360_aave_2001-2001-0001.status ts_atm_monthly_180x360_aave_2001-2001-0001.status
# e3sm_to_cmip_atm_monthly_180x360_aave_2002-2002-0001.status ts_atm_monthly_180x360_aave_2002-2002-0001.status
# e3sm_to_cmip_atm_monthly_180x360_aave_2003-2003-0001.status ts_atm_monthly_180x360_aave_2003-2003-0001.status
# e3sm_to_cmip_atm_monthly_180x360_aave_2004-2004-0001.status ts_atm_monthly_180x360_aave_2004-2004-0001.status
# ts_atm_monthly_180x360_aave_1995-1995-0001.status
# So ts has status files but no bash files...
grep -v "OK" *status
# grep: ts_atm_monthly_180x360_aave_1995-2004.status: Permission denied
# So no errors in status files, and one status file without permissions set up.
grep -A 1 "environment" e3sm_to_cmip_atm_monthly_180x360_aave_2004-2004-0001.settings
# 'environment_commands': 'source '
# '/global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh',
grep "www" e3sm_to_cmip_atm_monthly_180x360_aave_2004-2004-0001.settings
# 'web_portal_base_path': '/global/cfs/cdirs/e3sm/www',
# 'www': '/global/cfs/cdirs/e3sm/www/zhan391/eamxx-pcmdi',
ls /global/cfs/cdirs/e3sm/www/zhan391/eamxx-pcmdi
# Empty
ls provenance.*.cfg
# ls: cannot access 'provenance.*.cfg': No such file or directory
# So, provenance cfg wasn't createdWe would expect a provenance cfg to at least exist, since Unified includes #713. In any case, Shixuan provided the cfg: Provided cfgThe Reviewing this discussion
Based on #717, NCO will now automatically exclude 3D variables. So maybe that could be propagating to
Yes, it is strange they don't even create bash files, let alone not run.
I think this is unlikely since Unified undergoes extensive multi-machine testing. Then again, it's not promising that it works on Chrysalis but not Perlmutter. Maybe a data availability issue? Looking at the Looking at: we find that's the same output dir as mentioned earlier in my comment. Based on Jill's comment
we may need to wait for any additional work in this directory to be done. |
Beta Was this translation helpful? Give feedback.
-
@chengzhuzhang @forsyth2 : Hi Jill and Ryan, I wanted to describe in detail how I’m running zppy-pcmdi for my EAMxx case, and where I’m currently running into issues. Here is the workflow I’m using:
Note that: I repeat the same procedure in the Chrysalis, and it indeed successfully processed the pcmdi diagnostics expected webpage (https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.szhang/e3sm-pcmdi/ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1/pcmdi_diags/model_vs_obs/viewer/). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Question criteria
What is the deadline?
This is not urgent, but I would greatly appreciate a response when convenient.
Describe your question
I’m seeing two issues when running the zppy workflow on NERSC Perlmutter using E3SM_Unified:
(1) e3sm_to_cmip does not perform vertical interpolation for 3D variables (e.g., U, V, T, Q). It appears that the vertical interpolation step/job is never activated or submitted. As a result, only the 2D variables are converted successfully, while the 3D fields are not produced on the target pressure levels.
(2) The PCMDI step is not triggered. Even with the partially completed e3sm_to_cmip results, PCMDI diagnostics (e.g., modes variability) would normally be expected to run, but no job is created for the PCMDI diagnostics step.
Are there are any possible answers you came across?
One possibility is that the E3SM_Unified environment on Perlmutter may not include a complete installation of the components required for e3sm_to_cmip vertical interpolation and/or zppy’s PCMDI modules.
I mention this because I transferred the same data and configuration to Chrysalis and reran the workflow there. On Chrysalis, all jobs related to e3sm_to_cmip vertical interpolation and zppy-pcmdi were at least triggered as expected, whereas on Perlmutter these jobs are not submitted.
What machine were you running on?
NERSC Perlmutter
Environment
On both Perlmutter and Chrysalis, I activated the same conda environment using:
conda activate e3sm_unified_latest
What command did you run?
Copy your cfg file
What jobs are failing?
No response
What stack trace are you encountering?
No response
Beta Was this translation helpful? Give feedback.
All reactions