If a line is longer than ITERATION_CHUNK_SIZE (corner case, but unfortunately large dataset happen in the wild), CSVReader::read_row will loop forever, spawning read_csv thread infinitely.
For now I've patched this using:
diff --git a/single_include/csv.hpp b/single_include/csv.hpp
index 9cebfc2..e24ddff 100644
--- a/single_include/csv.hpp
+++ b/single_include/csv.hpp
@@ -7517,6 +7517,7 @@ namespace csv {
*
*/
CSV_INLINE bool CSVReader::read_row(CSVRow &row) {
+ bool readRequested = false;
while (true) {
if (this->records->empty()) {
if (this->records->is_waitable())
@@ -7530,7 +7531,15 @@ namespace csv {
if (this->read_csv_worker.joinable())
this->read_csv_worker.join();
+ // More reading was already requested and not successful
+ if (readRequested && this->records->empty())
+ // Other error, e.g. line is too large for ITERATION_CHUNK_SIZE
+ throw std::runtime_error(
+ "End of file not reached and no more records parsed, this may be due to a line longer than "
+ + std::to_string(internals::ITERATION_CHUNK_SIZE) + " bytes.");
+
this->read_csv_worker = std::thread(&CSVReader::read_csv, this, internals::ITERATION_CHUNK_SIZE);
+ readRequested = true;
}
}
else if (this->records->front().size() != this->n_cols &&
This avoids spawning a new thread if the previous one was unsuccessful (here because the line is too long for the chunk and can't be parsed).
It looks like this should be handle better/elsewhere. But for now this seems to work for me.
If a line is longer than
ITERATION_CHUNK_SIZE(corner case, but unfortunately large dataset happen in the wild),CSVReader::read_rowwill loop forever, spawningread_csvthread infinitely.For now I've patched this using:
This avoids spawning a new thread if the previous one was unsuccessful (here because the line is too long for the chunk and can't be parsed).
It looks like this should be handle better/elsewhere. But for now this seems to work for me.