Versions of
❯ nccopy
..
netCDF library version 4.9.2 of Aug 31 2024 11:17:57 $
uname -a
Linux phronesis 6.6.72-1-lts #1 SMP PREEMPT_DYNAMIC Fri, 17 Jan 2025 14:04:26 +0000 x86_64 GNU/Linux
Issue
I think that using nccopy to rechunk data via the -c option along with -u prevent the rechunking to really happen. A simple test case :
A hint already in the time it takes to complete the "copy" :
❯ time nccopy -c time/48,lat/4,lon/4 -u -d 1 -w SISin200001010000004231000101MA_European_framgment.nc test2.nc
real 0m0.758s
user 0m0.712s
sys 0m0.041s
❯ time nccopy -c time/48,lat/4,lon/4 -d 1 -w SISin200001010000004231000101MA_European_framgment.nc test3.nc
real 1m6.757s
user 1m6.407s
sys 0m0.240s
and then looking at the output files via Xarray, for example
an "original" file is :
xr.open_dataset('SISin200001010000004231000101MA_European_framgment.nc').SIS.encoding
{'dtype': dtype('int16'),
'zlib': True,
'szip': False,
'zstd': False,
'bzip2': False,
'blosc': False,
'shuffle': False,
'complevel': 4,
'fletcher32': False,
'contiguous': False,
'chunksizes': (1, 609, 1109),
'preferred_chunks': {'time': 1, 'lat': 609, 'lon': 1109},
'source': '/mnt/sandbox/european_fragment/netcdf/test/SISin200001010000004231000101MA_European_framgment.nc',
'original_shape': (48, 609, 1109),
'missing_value': np.int16(-999),
'_FillValue': np.int16(-999)}
and two rechunked test files :
xr.open_dataset('test2.nc').SIS.encoding
{'dtype': dtype('int16'),
'zlib': True,
'szip': False,
'zstd': False,
'bzip2': False,
'blosc': False,
'shuffle': False,
'complevel': 1,
'fletcher32': False,
'contiguous': False,
'chunksizes': (24, 305, 555),
'preferred_chunks': {'time': 24, 'lat': 305, 'lon': 555},
'source': '/mnt/sandbox/european_fragment/netcdf/test/test2.nc',
'original_shape': (48, 609, 1109),
'missing_value': np.int16(-999),
'_FillValue': np.int16(-999)}
xr.open_dataset('test3.nc').SIS.encoding
{'dtype': dtype('int16'),
'zlib': True,
'szip': False,
'zstd': False,
'bzip2': False,
'blosc': False,
'shuffle': False,
'complevel': 1,
'fletcher32': False,
'contiguous': False,
'chunksizes': (48, 4, 4),
'preferred_chunks': {'time': 48, 'lat': 4, 'lon': 4},
'source': '/mnt/sandbox/european_fragment/netcdf/test/test3.nc',
'original_shape': (48, 609, 1109),
'missing_value': np.int16(-999),
'_FillValue': np.int16(-999)}
Excuse the ignorance if this is a known behavior.
Versions of
Issue
I think that using
nccopyto rechunk data via the-coption along with-uprevent the rechunking to really happen. A simple test case :A hint already in the time it takes to complete the "copy" :
and then looking at the output files via Xarray, for example
an "original" file is :
and two rechunked test files :
Excuse the ignorance if this is a known behavior.