Infinite loop when parsing lines longer than 10MB / ITERATION_CHUNK_SIZE

If a line is longer than `ITERATION_CHUNK_SIZE` (corner case, but unfortunately large dataset happen in the wild), `CSVReader::read_row` will loop forever, spawning `read_csv` thread infinitely.

For now I've patched this using:
```patch
diff --git a/single_include/csv.hpp b/single_include/csv.hpp
index 9cebfc2..e24ddff 100644
--- a/single_include/csv.hpp
+++ b/single_include/csv.hpp
@@ -7517,6 +7517,7 @@ namespace csv {
      *
      */
     CSV_INLINE bool CSVReader::read_row(CSVRow &row) {
+        bool readRequested = false;
         while (true) {
             if (this->records->empty()) {
                 if (this->records->is_waitable())
@@ -7530,7 +7531,15 @@ namespace csv {
                     if (this->read_csv_worker.joinable())
                         this->read_csv_worker.join();
 
+                    // More reading was already requested and not successful
+                    if (readRequested && this->records->empty())
+                        // Other error, e.g. line is too large for ITERATION_CHUNK_SIZE
+                        throw std::runtime_error(
+                            "End of file not reached and no more records parsed, this may be due to a line longer than "
+                            + std::to_string(internals::ITERATION_CHUNK_SIZE) + " bytes.");
+
                     this->read_csv_worker = std::thread(&CSVReader::read_csv, this, internals::ITERATION_CHUNK_SIZE);
+                    readRequested = true;
                 }
             }
             else if (this->records->front().size() != this->n_cols &&
```

This avoids spawning a new thread if the previous one was unsuccessful (here because the line is too long for the chunk and can't be parsed).

It looks like this should be handle better/elsewhere. But for now this seems to work for me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite loop when parsing lines longer than 10MB / ITERATION_CHUNK_SIZE #218

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Infinite loop when parsing lines longer than 10MB / ITERATION_CHUNK_SIZE #218

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions