Skip to content

nccopy -c -u prevents from rechunking the data #3134

@NikosAlexandris

Description

@NikosAlexandris

Versions of

❯ nccopy
..
netCDF library version 4.9.2 of Aug 31 2024 11:17:57 $
  uname -a
Linux phronesis 6.6.72-1-lts #1 SMP PREEMPT_DYNAMIC Fri, 17 Jan 2025 14:04:26 +0000 x86_64 GNU/Linux

Issue

I think that using nccopy to rechunk data via the -c option along with -u prevent the rechunking to really happen. A simple test case :

A hint already in the time it takes to complete the "copy" :

time nccopy -c time/48,lat/4,lon/4 -u -d 1 -w SISin200001010000004231000101MA_European_framgment.nc test2.nc

real	0m0.758s
user	0m0.712s
sys	0m0.041s

❯ time nccopy -c time/48,lat/4,lon/4 -d 1 -w SISin200001010000004231000101MA_European_framgment.nc test3.nc

real	1m6.757s
user	1m6.407s
sys	0m0.240s

and then looking at the output files via Xarray, for example

an "original" file is :

xr.open_dataset('SISin200001010000004231000101MA_European_framgment.nc').SIS.encoding
{'dtype': dtype('int16'),
 'zlib': True,
 'szip': False,
 'zstd': False,
 'bzip2': False,
 'blosc': False,
 'shuffle': False,
 'complevel': 4,
 'fletcher32': False,
 'contiguous': False,
 'chunksizes': (1, 609, 1109),
 'preferred_chunks': {'time': 1, 'lat': 609, 'lon': 1109},
 'source': '/mnt/sandbox/european_fragment/netcdf/test/SISin200001010000004231000101MA_European_framgment.nc',
 'original_shape': (48, 609, 1109),
 'missing_value': np.int16(-999),
 '_FillValue': np.int16(-999)}

and two rechunked test files :

xr.open_dataset('test2.nc').SIS.encoding
{'dtype': dtype('int16'),
 'zlib': True,
 'szip': False,
 'zstd': False,
 'bzip2': False,
 'blosc': False,
 'shuffle': False,
 'complevel': 1,
 'fletcher32': False,
 'contiguous': False,
 'chunksizes': (24, 305, 555),
 'preferred_chunks': {'time': 24, 'lat': 305, 'lon': 555},
 'source': '/mnt/sandbox/european_fragment/netcdf/test/test2.nc',
 'original_shape': (48, 609, 1109),
 'missing_value': np.int16(-999),
 '_FillValue': np.int16(-999)}
xr.open_dataset('test3.nc').SIS.encoding
{'dtype': dtype('int16'),
 'zlib': True,
 'szip': False,
 'zstd': False,
 'bzip2': False,
 'blosc': False,
 'shuffle': False,
 'complevel': 1,
 'fletcher32': False,
 'contiguous': False,
 'chunksizes': (48, 4, 4),
 'preferred_chunks': {'time': 48, 'lat': 4, 'lon': 4},
 'source': '/mnt/sandbox/european_fragment/netcdf/test/test3.nc',
 'original_shape': (48, 609, 1109),
 'missing_value': np.int16(-999),
 '_FillValue': np.int16(-999)}

Excuse the ignorance if this is a known behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions