Environment Information
- What platform are you using? (please provide specific distribution/version in summary)
- 32 and/or 64 bit?
- What build system are you using?
- Can you provide a sample netCDF file or
C code to recreate the issue?
Summary of Issue
NOTE: my dvarput.c is modified from 4.5.1-devel as described in #447 -- the early return if nels==0 has been removed.
If nc_put_vars_double is called in parallel with stride != 1 and some processors have data to output and some do not and netcdf-4 (hdf5-based) output is being used in a collective mode, then the code will hang since only the processors with data to output will call down in to the H5Dwrite function. This function assumes that all processors will call whether they have data or not and uses a PMPI_Allreduce down in the call stack.
The issue arises in NCDEFAULT_put_vars. If stride is 1, then everything works ok since all processors call NC_put_vars at line 246 of dvarput.c (4.5.1-devel)
However, if the stride is not 1, then the code falls down to the odometer code below that. All processors call odom_init, but then the while is only called by the processors that have data (some lines deleted below):
odom_init(&odom,rank,mystart,myedges,mystride);
while(odom_more(&odom)) {
int localstatus = NC_NOERR;
localstatus = NC_put_vara(ncid,varid,odom.index,nc_sizevector1,memptr,memtype);
memptr += memtypelen;
odom_next(&odom);
}
If netcdf-4 (hdf5-based) collective output is being done, then the code will hang down below H5Dwrite due to hdf5 library calling PMPI_Allreduce.
I don't have a suggested fix for this issue. I tried rewriting my code to use nc_put_vara_double instead, but that is not easily done for this particular call.
This does work if I use pnetcdf non-collective output and probably also netcdf-4 non-collective
Environment Information
configure)Ccode to recreate the issue?Summary of Issue
NOTE: my dvarput.c is modified from 4.5.1-devel as described in #447 -- the early return if nels==0 has been removed.
If
nc_put_vars_doubleis called in parallel with stride != 1 and some processors have data to output and some do not and netcdf-4 (hdf5-based) output is being used in a collective mode, then the code will hang since only the processors with data to output will call down in to theH5Dwritefunction. This function assumes that all processors will call whether they have data or not and uses a PMPI_Allreduce down in the call stack.The issue arises in
NCDEFAULT_put_vars. If stride is 1, then everything works ok since all processors callNC_put_varsat line 246 of dvarput.c (4.5.1-devel)However, if the stride is not 1, then the code falls down to the
odometercode below that. All processors callodom_init, but then thewhileis only called by the processors that have data (some lines deleted below):If netcdf-4 (hdf5-based) collective output is being done, then the code will hang down below
H5Dwritedue to hdf5 library callingPMPI_Allreduce.I don't have a suggested fix for this issue. I tried rewriting my code to use
nc_put_vara_doubleinstead, but that is not easily done for this particular call.This does work if I use pnetcdf non-collective output and probably also netcdf-4 non-collective