Skip to content

2.4.2: Fix #218 and #259#288

Merged
vincentlaucsb merged 2 commits intomasterfrom
feb-16-2026
Feb 17, 2026
Merged

2.4.2: Fix #218 and #259#288
vincentlaucsb merged 2 commits intomasterfrom
feb-16-2026

Conversation

@vincentlaucsb
Copy link
Copy Markdown
Owner

@vincentlaucsb vincentlaucsb commented Feb 17, 2026

Fix Edge Cases and Third-Party Stream Compatibility

Summary

This PR addresses two legitimate edge cases and improvements discovered by community members:

Issue #218: Infinite Loop on Rows Exceeding Chunk Size

Thanks to @sjoubert for the detailed report and working patch suggestion.

Problem: When a CSV row exceeds ITERATION_CHUNK_SIZE (10MB), the parser would enter an infinite loop spawning threads indefinitely, eventually crashing with resource exhaustion.

Solution:

  • Added set_chunk_size(size_t) API to allow users to customize chunk size
  • Chunk size must be at least 10MB (enforced with clear error messages)
  • Implemented infinite loop detection in read_row()
  • Throws informative exception: "End of file not reached and no more records parsed. This likely indicates a CSV row larger than the chunk size of X bytes. Use set_chunk_size() to increase the chunk size."
  • Added comprehensive test suite in test_edge_cases_large_rows.cpp

Credit: The core detection logic is based on @sjoubert's patch from the issue.

Issue #259: StreamParser Compilation Error with Non-Copyable Streams

Thanks to @addy90 for the clear problem description and working solution with Godbolt example.

Problem: StreamParser attempted to std::move() a stream reference into a value member, causing compilation errors with third-party stream libraries that have deleted copy constructors (e.g., Boost streams, custom streams).

Solution:

  • Changed TStream _source to TStream& _source (reference member)
  • Removed unnecessary std::move() calls in constructors
  • Now works seamlessly with any stream source that supports references
  • Added test file test_stream_sources.cpp with mock non-copyable stream to prevent regression

Credit: Implementation follows exactly the approach suggested by @addy90.

Testing

✅ All existing tests pass
✅ New test: test_edge_cases_large_rows.cpp - validates chunk size customization and infinite loop detection
✅ New test: test_stream_sources.cpp - validates third-party stream compatibility
✅ Full CI/CD pipeline (CMake, Sanitizers, CodeQL) runs on PR

Changes

New Public API

  • void CSVReader::set_chunk_size(size_t size) - Configure chunk size (min 10MB, default 10MB)

Modified Files

  • include/internal/csv_reader.hpp - Added set_chunk_size(), _chunk_size, _read_requested members
  • include/internal/csv_reader.cpp - Implemented infinite loop detection in read_row()
  • include/internal/basic_csv_parser.hpp - Fixed StreamParser to use reference member

New Test Files

  • tests/test_edge_cases_large_rows.cpp - Comprehensive edge case coverage for large rows
  • tests/test_stream_sources.cpp - Third-party stream compatibility tests

Contributors

Special thanks to:

Notes

  • This is a defensive fix: both edge cases are rare but real when working with large datasets or third-party libraries
  • All changes maintain backward compatibility
  • No breaking changes to the public API
  • The 10MB chunk size minimum is based on empirical testing

@vincentlaucsb vincentlaucsb merged commit e26bb30 into master Feb 17, 2026
12 checks passed
@vincentlaucsb vincentlaucsb deleted the feb-16-2026 branch February 18, 2026 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant