Skip to content

nc_put_vars_double fails in parallel if using netcdf-4 collective #447

@gsjaardema

Description

@gsjaardema

Version 4.5.1-devel.

If I call nc_put_vars_double with stride=1 in parallel with some processors having no data to write, then the H5Dwrite call will fail.

The problem is due to the

   if(nels == 0)
      return NC_NOERR; /* cannot write anything */

at line 244 of libdispatch/dvarput.c. If I remove that early return and if stride == 1, then the code will complete correctly. If that line is left as is, then some processors return early and the code hangs down below H5Dwrite due to hdf5 calling PMPI_Allreduce if using collective io.

There is another issue if stride != 1, but I will report that in a separate issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions