Conversation
|
The CI is failing due to DAP4 timeouts. I can make that testing more robust in a future PR, but I want to stay focused on memory issues in this PR. Some of the leaks were in HDF5, and only resolved in the 2.0.0 release, so in this PR I add HDF5-2.0.0 testing to the CI. I also have some leak tests that only run with HDF5 >= 2.0.0. If there is another way you want to do HDF5-2.0.0 testing, let me know. If it is using docker, that I can take a look at that. HDF5-2.0.0 has been out a very long time, so it's not clear why there is no testing for it in the CI. |
|
I'll merge this with the 2.0.0, and bumping to the 2.1.x bugfix can wait for a separate issue. |
|
@WardF sounds good, let's get this merged and I can then add 2.1.0 testing as well. |
|
There's a lot going on here, and on a whim, I asked Claude to review this PR and it generated a nice report that pointed out some issues with this PR. It illustrates the issue/challenge with using these tools, since their output needs to be verified. I have no idea which version of Claude (the one who generated these fixes? The one who generated the report I've attached here?) is wrong, but the amount of time it would take to validate this is more than I have at the moment. I think having PR's fix one issue at a time (unless the issues are clearly linked and are solved by a single fix) will help keep this manageable from a time-investment standpoint. As it stands, I don't know if this PR actually fixes the issues it claims to fix (Claude says 2664, 2666, 2667 are not substantively fixed), and neither 2668 or 2436 have any code fix. Given the size of what needs to be double checked manually, this is merely shifting my time commitment from 'code review' to 'AI fact checking'. Robbing Peter to pay OpenAI, one supposes XD.
|
|
Update: It's Markdown so I just went ahead and pasted here. PR #3330 Review: Memory Fixes for netCDF-4 CodePrompt summary: Perform a high-quality review of upstream PR #3330 ("memory fixes for netCDF-4 code") and generate a report. The review covers code correctness, completeness of fixes, test quality, CI changes, and platform compatibility. Intended audience: netCDF-C maintainers and contributors with familiarity with C memory management, HDF5 internals, and the netCDF-4 dispatch architecture. Table of Contents
PR Overview
This PR targets a cluster of memory safety bugs in the netCDF-4 / HDF5 backend. Two source files receive actual defect fixes; the remaining ten files are new tests, CI workflow updates, and build system changes. Issues Addressed
Files Changed
File-by-File Analysis
|
| # | Finding |
|---|---|
| 1 | Issues #2664, #2666, #2667 not substantively fixed. The crashes occur inside H5DSget_num_scales() — before any code changed by this PR. The guard and loop-bound fix are unreachable in the current call graph and do not prevent the actual crash site. These are corrupt-file crashes inside HDF5 itself. |
| 2 | Issue #2668 has no code fix. hdf5var.c is not modified. The test covers valid-file edge cases only, not the corrupt POV file scenario. The PR claims to fix #2668 but the fix is absent. |
| 3 | Issue #2436 has no code fix. No change addresses the corrupt-file SegV reported in #2436. |
🟡 Moderate
| # | Finding |
|---|---|
| 4 | if (ndims != var->ndims) guard is unreachable in the current call graph. It is good defensive code, but the PR description overstates its effectiveness as a fix for the reported crashes. |
| 5 | Test cases for #2664/#2666/#2667/#2668 use valid files and do not reproduce the original corrupt-file scenarios. These are regression baselines, not regression tests for the actual bugs. |
🟢 Minor
| # | Finding |
|---|---|
| 6 | ru_maxrss comment in both new test files incorrectly states "current RSS." It is the peak (high-water mark) RSS. The test logic remains valid for leak detection. |
| 7 | tst_mem_safety.c has copyright year "2024" — should be "2026". |
| 8 | Pre-existing leak: second malloc failure in get_attached_info leaks dimscale_attached. Not introduced by this PR, but not fixed either. |
Platform Compatibility
32-bit
No issues. All arithmetic uses size_t for allocation sizing. The ndims != var->ndims comparison is safe on 32-bit. getrusage and ru_maxrss are long, which is 32-bit on 32-bit platforms; RSS values will be correct for any realistic file.
64-bit
No issues. Standard behavior.
macOS
CI correctly expands to HDF5 2.0.0. The RSS tests are built but not registered for CI execution on macOS (HDF5 file-backed mmap pages cause ru_maxrss to fluctuate). Developers can still run them manually. No platform-specific concerns with the core fixes.
Linux
Primary target for memory testing. The RSS approach is sound for detecting heap leaks on Linux. No concerns.
Windows (MSVC / MinGW)
No HAVE_SYS_RESOURCE_H on Windows, so all getrusage-based checks are compiled out. The retry logic for pacman is a clear improvement for CI reliability. No concerns with the core fixes on Windows.
Big-Endian
No endianness-sensitive changes. The RSS measurement and the error-path cleanup are byte-order neutral. No concerns.
AI-Generation Assessment
Estimated likelihood of AI assistance: High for test code and CI changes; Moderate for the core fixes.
Supporting signals:
- Highly uniform, comprehensive block comments in both new test files with detailed platform rationale paragraphs.
- All CI changes follow a perfectly consistent structural pattern across three workflow files.
- Test functions follow a formulaic structure (create → populate → close → reopen → verify → cleanup).
- The PR fixes multiple loosely-related issues in a single branch, a common pattern in AI-assisted batch-fixing sessions.
- The retry logic in
run_tests_win_mingw.ymlis a textbook CI-hardening pattern.
Mitigating signals:
- Deep issue cross-referencing (matching POV crash reports to specific line numbers) suggests genuine human analysis.
- The author's acknowledgment of HDF5 2.0.0 leak origins (requiring domain expertise) indicates human review.
- The discussion thread shows active engagement and technical reasoning by the author.
Human-in-the-loop assessment: The domain knowledge, issue triage, and PR discussion strongly indicate an experienced human author directing the work. AI assistance appears likely for boilerplate test scaffolding and CI YAML, with human authorship for the core analysis and fixes.
Summary and Recommendation
What This PR Does Well
- ✅ The
nc4internal.cfix (issue detected memory leaks in NC_hashmapnew netcdf/libdispatch/nchashmap.c #2665) is correct, clean, and complete. It properly cleans up all resources on the error path. - ✅ The HDF5 2.0.0 CI infrastructure is well-engineered with proper cache invalidation, matrix exclusions, and consistent changes across both build systems.
- ✅ The
tst_h5_open_close.cisolation test is a valuable diagnostic tool for identifying where open/close growth originates. - ✅ The retry logic for Windows CI resolves a real intermittent failure mode.
- ✅ Both CMake and Autotools build systems are updated consistently.
What Needs Attention Before Merge
-
Issues SEGV in get_attached_info netcdf/libhdf5/hdf5open.c #2664, heap-buffer-overflow in get_attached_info netcdf/libhdf5/hdf5open.c #2666, memcpy-param-overlap in get_attached_info netcdf/libhdf5/hdf5open.c #2667: The PR should either (a) document that these crashes are fundamentally inside HDF5 and cannot be fully fixed at the netCDF-C level, or (b) add a caller-side check before invoking
H5DSget_num_scalesto validate the dataset's dimension count. Simply claiming these are fixed overstates what the code change achieves. -
Issue heap-buffer-overflow in NC4_get_vars netcdf/libhdf5/hdf5var.c #2668: Either add the missing fix to
hdf5var.c(the heap-buffer-overflow root cause) or remove the issue from the "Fixes" list. The test currently exercises only valid-file scenarios. -
Issue SegV with corrupt netCDF files #2436: Similarly, remove from "Fixes" if no code change addresses it, or add the corresponding fix.
-
Comment correction: Change
"current RSS"→"peak (high-water mark) RSS since process start"in both new test files. -
Copyright year: Update
tst_mem_safety.cfrom"Copyright 2024"to"Copyright 2026".
Generated by GitHub Copilot (claude-sonnet-4.6) on 2026-04-01. Review performed against PR #3330 opened 2026-03-17, last updated 2026-04-01.
|
I have made the changes called for in the code review. I removed the issues about which there is some controvery. I will circle around to them again after this is merged. |
|
WRT review comments and their length: this was an ambitious PR which attempted to close many issues. Too many, in the opinion of claude. Basically, all objections were to "Fixes #NNN" in the PR desc, not objections to code changes in the PR. Let us see how claude does reviewing more focused PRs... This PR is ready to merge. |
|
Thanks Ed, taking a look now! |
|
There are a couple things highlighted that I'll take a look at but nothing particularly alarming or worrying. More a question of using it as a starting point for things I might not have thought of. The report for reference: Looks good overall, thanks! |
|
Thanks! |

Fixes #2626
In this PR I fix some reported memory leaks. As with other recent fixes, only a small amount of code is involved.
This is another of the areas where Dennis and I spent plenty of time, but we still missed a few frees.
This PR also includes PR #3329 so that these tests are run with HDF5-2.0.0. There are some memory leaks in HDF5 itself in earlier versions, so memory testing for these bugs is restricted to HDF5-2.0.0 and greater versions.
I also checked memory leaks under macos and windows and these fixes also work there, but I do not test those platforms with automatic testing; ubuntu is easiest, and since these are classic memory leaks (i.e. malloc without calling free) they work the same on every platform. However, measuring memory use is different on each platform and a test that works on all platforms would contain a large amount of extra code, and provide no additional benefit.