vincentlaucsb · vincentlaucsb · Feb 16, 2026 · Feb 16, 2026 · Feb 16, 2026 · Feb 16, 2026
diff --git a/.claude/rules/csv_reader_rules.md b/.claude/rules/csv_reader_rules.md
@@ -0,0 +1,39 @@
+---
+paths:
+  - "include/internal/csv_reader.hpp"
+  - "include/internal/csv_reader.cpp"
+  - "include/internal/csv_reader_iterator.cpp"
+---
+
+# CSVReader Iterator Rules
+
+## CRITICAL CONSTRAINT: DO NOT Cache All RawCSVDataPtr Chunks
+
+### The Streaming Architecture
+- Library parses 50+ GB CSV files with bounded memory (e.g., 8GB RAM)
+- Previous data chunks are freed as iterator advances
+- `CSVReader::iterator` is `std::input_iterator_tag` by design (single-pass)
+
+### When You See Heap-Use-After-Free with std::max_element or ForwardIterator Algorithms
+
+**❌ WRONG FIX (defeats streaming)**:
+```cpp
+class iterator {
+    std::vector<RawCSVDataPtr> _all_data_chunks;  // NO!
+    // Result: 50GB CSV requires 50GB RAM
+};
+```
+
+**✅ CORRECT FIXES**:
+1. **Document limitation**: ForwardIterator algorithms not supported on CSVReader::iterator
+2. **Fix test**: Copy to vector first: `std::vector<CSVRow> rows(reader.begin(), reader.end());`
+3. **Update README**: Show vector-based workaround for std::max_element
+
+### Valid RawCSVDataPtr Usage
+- ✅ `CSVRow::iterator` CAN have RawCSVDataPtr member (single row scope, negligible memory)
+- ❌ `CSVReader::iterator` CANNOT cache all chunks (file scope, unbounded memory)
+
+### Root Cause Analysis
+- Small test files (< chunk size) fit in one chunk → ForwardIterator algorithms appear to work
+- Large files span multiple chunks → heap-use-after-free when algorithm accesses freed chunk
+- Solution: Document as unsupported OR provide vector workaround, NOT cache all chunks
diff --git a/.github/workflows/README.md b/.github/workflows/README.md
@@ -0,0 +1,117 @@
+# GitHub Actions Memory Checking Setup
+
+This directory contains GitHub Actions workflows for comprehensive memory and thread safety testing.
+
+## Workflows
+
+### 🧵 sanitizers.yml - Memory & Thread Sanitizers (PRIMARY)
+**Runs:** On every push and pull request  
+**Sanitizers Included:**
+- **ThreadSanitizer (TSan)** - Detects data races and thread safety issues (CRITICAL for csv-parser)
+- **AddressSanitizer (ASan)** - Detects memory errors: use-after-free, buffer overflows, memory leaks
+- **UndefinedBehaviorSanitizer (UBSan)** - Catches undefined behavior: signed overflow, type mismatches
+
+**Config:**
+- Runs on Ubuntu with GCC
+- Tests C++17 and C++20 standards
+- Debug builds for better diagnostics
+- Timeout: 20 minutes per configuration
+- Artifacts: Upload logs on failure
+
+**Key Features:**
+- Matrix testing: 3 sanitizers × 2 C++ standards = 6 parallel jobs
+- Fail-fast disabled to see all results
+- Environment variables configured for halt-on-error behavior
+
+### 💾 valgrind.yml - Valgrind Memory Profiler
+**Runs:** On every push and pull request  
+**Uses:** Valgrind with full leak checking
+
+**Config:**
+- Runs on Ubuntu with GCC
+- Debug build with -O1 for balance
+- Excludes multi-threaded tests (Valgrind + threads = slow)
+- Timeout: 60 minutes (Valgrind is slower)
+
+### 🔍 codeql.yml - Static Analysis (GitHub CodeQL)
+**Runs:** On every push and pull request + weekly schedule  
+**Analysis:**
+- Deep static analysis for security and quality issues
+- Integrates with GitHub Security tab
+
+## Testing Recommendations
+
+### Local Testing Before Push
+```bash
+# Test with ThreadSanitizer (most critical)
+cmake -B build/tsan -DCMAKE_BUILD_TYPE=Debug \
+  -DCMAKE_CXX_FLAGS="-fsanitize=thread -g"
+cmake --build build/tsan
+cd build/tsan && ctest --output-on-failure && cd ../..
+
+# Test with AddressSanitizer
+cmake -B build/asan -DCMAKE_BUILD_TYPE=Debug \
+  -DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer -g"
+cmake --build build/asan
+cd build/asan && ctest --output-on-failure && cd ../..
+```
+
+### CI/CD Pipeline Order
+1. **Primary:** ThreadSanitizer - Catches data races in threading model
+2. **Primary:** AddressSanitizer - Catches memory corruption
+3. **Secondary:** UBSanitizer - Catches subtle undefined behavior
+4. **Nightly:** Valgrind - Comprehensive memory profiling
+5. **Continuous:** CodeQL - Static analysis
+
+## Known Considerations
+
+### ThreadSanitizer (TSan)
+- **Why CRITICAL:** csv-parser uses worker threads + ThreadSafeDeque + double_quote_fields lazy init
+- May report false positives in STL (benign race detection)
+- Slower execution due to extra instrumentation
+- Catches real issues in PR #282 exception propagation and issue #278 move semantics
+
+### AddressSanitizer (ASan)
+- **Why Important:** Catches CSVFieldList memory issues like issue #278
+- Cannot run simultaneously with TSan (different memory models)
+- Better performance than TSan for memory safety
+
+### Valgrind
+- Slower than sanitizers but more mature tool
+- Useful for final verification before releases
+- Runs on push (not PR) to avoid GitHub Actions timeout
+
+### CodeQL
+- No runtime overhead
+- Integrated into GitHub Security tab
+- Good for catching security issues and code quality
+
+## Interpreting Results
+
+### Sanitizer Failures
+Look for:
+```
+ERROR: ThreadSanitizer: data race detected
+ERROR: AddressSanitizer: heap-buffer-overflow
+ERROR: UndefinedBehaviorSanitizer: signed integer overflow
+```
+
+### Action Items
+1. Check artifacts uploaded on test failure
+2. Look at the specific sanitizer output in GitHub Actions logs
+3. Focus on ThreadSanitizer results first (threading complexity)
+4. Correlate with codebase changes to identify root cause
+
+## History
+
+- **PR #282:** ThreadSanitizer would catch exception propagation issues
+- **Issue #278:** AddressSanitizer catches CSVFieldList move semantics bug
+- **PR #237:** TSan catches double_quote_fields concurrency todo
+- **v2.3.0:** ThreadSafeDeque prevents std::vector reallocation race
+
+## Reference
+
+- [ThreadSanitizer Documentation](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual)
+- [AddressSanitizer Documentation](https://github.com/google/sanitizers/wiki/AddressSanitizer)
+- [Valgrind User Manual](https://valgrind.org/docs/manual/)
+- [CodeQL Query Help](https://codeql.github.io/codeql-query-help/)
diff --git a/.github/workflows/cmake-multi-platform.yml b/.github/workflows/cmake-multi-platform.yml
@@ -4,7 +4,7 @@ name: CMake on multiple platforms
 
 on:
   push:
-    branches: [ "master", "memory-fix-csvfieldlist" ]
+    branches: [ "master", "feb-15-2026" ]
   pull_request:
     branches: [ "master" ]
 

diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml
@@ -0,0 +1,58 @@
+name: CodeQL Analysis
+
+on:
+  push:
+    branches: [ "master", "feb-15-2026" ]
+  pull_request:
+    branches: [ "master" ]
+  schedule:
+    - cron: '0 2 * * 0'  # Weekly on Sunday at 2 AM UTC
+
+jobs:
+  analyze:
+    name: CodeQL C++
+    runs-on: ubuntu-latest
+    permissions:
+      actions: read
+      contents: read
+      security-events: write
+
+    strategy:
+      fail-fast: false
+      matrix:
+        language: ['cpp']
+
+    steps:
+    - name: Checkout repository and submodules
+      uses: actions/checkout@v4
+      with:
+        submodules: recursive
+
+    - name: Initialize CodeQL
+      uses: github/codeql-action/init@v3
+      with:
+        language: ${{ matrix.language }}
+        queries: security-and-quality
+
+    - name: Install dependencies
+      run: |
+        sudo apt-get update
+        sudo apt-get install -y build-essential cmake
+
+    - name: Configure CMake
+      run: |
+        mkdir -p build
+        cd build
+        cmake .. \
+          -DCMAKE_BUILD_TYPE=Release \
+          -DCSV_CXX_STANDARD=17 \
+          -DCMAKE_CXX_COMPILER=g++ \
+          -DCMAKE_C_COMPILER=gcc
+
+    - name: Build for CodeQL analysis
+      run: cmake --build build --config Release
+
+    - name: Run CodeQL analysis
+      uses: github/codeql-action/analyze@v3
+      with:
+        category: "/language:${{ matrix.language }}"
diff --git a/.github/workflows/sanitizers.yml b/.github/workflows/sanitizers.yml
@@ -0,0 +1,111 @@
+name: Memory and Thread Sanitizers
+
+on:
+  push:
+    branches: [ "master", "feb-15-2026" ]
+  pull_request:
+    branches: [ "master" ]
+
+jobs:
+  sanitizers:
+    runs-on: ubuntu-latest
+
+    strategy:
+      fail-fast: false
+      matrix:
+        sanitizer: [address, thread, undefined]
+        cxx_standard: [17, 20]
+        include:
+          - sanitizer: address
+            flag: "-fsanitize=address -fno-omit-frame-pointer"
+            name: "AddressSanitizer"
+          - sanitizer: thread
+            flag: "-fsanitize=thread"
+            name: "ThreadSanitizer"
+          - sanitizer: undefined
+            flag: "-fsanitize=undefined -fno-omit-frame-pointer"
+            name: "UndefinedBehaviorSanitizer"
+
+    env:
+      ASAN_OPTIONS: "detect_leaks=1:halt_on_error=0"
+      TSAN_OPTIONS: "halt_on_error=1"
+      UBSAN_OPTIONS: "halt_on_error=1"
+
+    steps:
+    - name: Checkout repository and submodules
+      uses: actions/checkout@v4
+      with:
+        submodules: recursive
+
+    - name: Install dependencies
+      run: |
+        sudo apt-get update
+        sudo apt-get install -y build-essential cmake
+
+    - name: Configure CMake with ${{ matrix.name }}
+      run: |
+        mkdir -p build
+        cd build
+        cmake .. \
+          -DCMAKE_BUILD_TYPE=Debug \
+          -DCSV_CXX_STANDARD=${{ matrix.cxx_standard }} \
+          -DCMAKE_CXX_COMPILER=g++ \
+          -DCMAKE_C_COMPILER=gcc \
+          -DCMAKE_CXX_FLAGS="${{ matrix.flag }} -g" \
+          -DCMAKE_C_FLAGS="${{ matrix.flag }} -g"
+
+    - name: Build with ${{ matrix.name }}
+      run: cmake --build build --config Debug
+
+    - name: Test with ${{ matrix.name }}
+      working-directory: build
+      run: ctest --output-on-failure -V
+      timeout-minutes: 20
+
+    - name: Upload sanitizer logs
+      if: failure()
+      uses: actions/upload-artifact@v4
+      with:
+        name: sanitizer-logs-${{ matrix.sanitizer }}-std${{ matrix.cxx_standard }}
+        path: build/Testing/
+
+  valgrind:
@@ -1,4 +1,6 @@
 name: Memory and Thread Sanitizers
+permissions:
+  contents: read
 on:
  push:
@@ -1,4 +1,6 @@
 name: Memory and Thread Sanitizers
+permissions:
+  contents: read

 on:
  push:
+    runs-on: ubuntu-latest
+
+    steps:
+    - name: Checkout repository and submodules
+      uses: actions/checkout@v4
+      with:
+        submodules: recursive
+
+    - name: Install dependencies
+      run: |
+        sudo apt-get update
+        sudo apt-get install -y build-essential cmake valgrind
+
+    - name: Configure CMake
+      run: |
+        mkdir -p build
+        cd build
+        cmake .. \
+          -DCMAKE_BUILD_TYPE=Debug \
+          -DCSV_CXX_STANDARD=17 \
+          -DCMAKE_CXX_COMPILER=g++ \
+          -DCMAKE_C_COMPILER=gcc \
+          -DCMAKE_CXX_FLAGS="-g -O1" \
+          -DCMAKE_C_FLAGS="-g -O1"
+
+    - name: Build
+      run: cmake --build build --config Debug
+
+    - name: Test with Valgrind
+      working-directory: build
+      run: ctest --output-on-failure
+      timeout-minutes: 60
+
+    - name: Upload Valgrind results
+      if: failure()
+      uses: actions/upload-artifact@v4
+      with:
+        name: valgrind-results
+        path: build/Testing/
@@ -1,5 +1,8 @@
 name: Memory and Thread Sanitizers
+permissions:
+  contents: read
+
 on:
  push:
    branches: [ "master", "feb-15-2026" ]
@@ -1,5 +1,8 @@
 name: Memory and Thread Sanitizers

+permissions:
+  contents: read
+
 on:
  push:
    branches: [ "master", "feb-15-2026" ]
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -8,6 +8,7 @@ else()
 endif(CSV_CXX_STANDARD)
 
 option(BUILD_PYTHON "Build Python Binding" OFF)
+option(ENABLE_CODE_COVERAGE "Enable code coverage instrumentation" OFF)
 
 message("Building CSV library using C++${CMAKE_CXX_STANDARD}")
 
@@ -29,7 +30,11 @@ if(MSVC)
 else()
 	# Ignore Visual Studio pragma regions
 	set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unknown-pragmas")
-  set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} --coverage -Og")
+
+	if(ENABLE_CODE_COVERAGE)
+		message("Code coverage instrumentation enabled")
+		set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} --coverage -Og")
+	endif()
 endif(MSVC)
 
 set(CSV_ROOT_DIR ${CMAKE_CURRENT_LIST_DIR})