Environment Information
- What platform are you using? (please provide specific distribution/version in summary)
- 32 and/or 64 bit?
- What build system are you using?
- Can you provide a sample netCDF file or
C code to recreate the issue?
Summary of Issue
When using parallel netCDF I get good runtimes on 1,3,4,5,6,7,8 nodes (16 cores each) but when using 2 nodes the time for file writes is about 10 times higher. This is reproducible.
Writing wall time on one node:
for the nc_put_vara_double in main.c:229 : ca 4,5 seconds
Writing wall time on two nodes:
for the nc_put_vara_double in main.c:229 : ca 45 seconds
Writing wall time on three nodes:
for the nc_put_vara_double in main.c:229 : ca 5 seconds
Steps to reproduce the behavior
very short version:
nc_create_par(file_name, NC_NETCDF4 | NC_MPIIO, MPI_CommWorld, MPI_INFO_NULL, &ncID);
nc_var_par_access(current_file_id, time_var_id, NC_COLLECTIVE);
nc_put_vara_double(current_file_id, time_var_id, start, count, &(m->time));
I even tried with multiple sizes (data about 4 MB up to 40MB) with the same results.
The source code is found here: https://github.com/xy124/parflow/blob/parFlowVR/flowvr/netcdf-writer/main.c . One of those is executed per compute Node! (And it gets data to write from other cores
A very small explication of the linked source code:
We wait for messages (fca_wait()) that contains data to write into netcdf files. These messages are then read and written (nc_put_vara_double(current_file_id, variable_var_id, start, count, data);) That works very fine but on 2 nodes as mentioned this takes very long.
I'm very thankful for every idea what I could try. Is there a good realtime profiler that works with netcdf? I tried google perf tools with no success.
Environment Information
configure)Ccode to recreate the issue?Summary of Issue
When using parallel netCDF I get good runtimes on 1,3,4,5,6,7,8 nodes (16 cores each) but when using 2 nodes the time for file writes is about 10 times higher. This is reproducible.
Writing wall time on one node:
for the nc_put_vara_double in main.c:229 : ca 4,5 seconds
Writing wall time on two nodes:
for the nc_put_vara_double in main.c:229 : ca 45 seconds
Writing wall time on three nodes:
for the nc_put_vara_double in main.c:229 : ca 5 seconds
Steps to reproduce the behavior
very short version:
I even tried with multiple sizes (
dataabout 4 MB up to 40MB) with the same results.The source code is found here: https://github.com/xy124/parflow/blob/parFlowVR/flowvr/netcdf-writer/main.c . One of those is executed per compute Node! (And it gets data to write from other cores
A very small explication of the linked source code:
We wait for messages (fca_wait()) that contains data to write into netcdf files. These messages are then read and written (
nc_put_vara_double(current_file_id, variable_var_id, start, count, data);) That works very fine but on 2 nodes as mentioned this takes very long.I'm very thankful for every idea what I could try. Is there a good realtime profiler that works with netcdf? I tried google perf tools with no success.