Skip to content

bug: Fix issue #249 (Infinite waiting during CsvReader.Read)#250

Merged
devinrsmith merged 3 commits intodeephaven:mainfrom
kosak:kosak_bug249
Jul 11, 2025
Merged

bug: Fix issue #249 (Infinite waiting during CsvReader.Read)#250
devinrsmith merged 3 commits intodeephaven:mainfrom
kosak:kosak_bug249

Conversation

@kosak
Copy link
Contributor

@kosak kosak commented Jul 11, 2025

TL;DR classic deadlock between writer and reader because code is trying to be clever.

Deep Background:

an early phase of processing looks like this:

  • read a line from the file
  • split that line into cells
  • push each cell into a separate queue, to be processed by a column reader running on its own thread

That is, there is one queue for each column.

Furthermore each queue is actually composed of three subqueues:

  • a "normal" writer, where "smaller" strings are packed into blocks and then appended to a queue when the block is full
  • a "large" writer, where "larger" strings are gathered; references to these strings are gathered in a different kind of block, and appended to a queue, when the block is full
  • a "control" writer, which contains an integer indicating which writer a string went into: normal or large. This is used so the reader can recover these strings in the proper order by reading from the normal or large queue, respectively. These control values are also packed into blocks and appended to a queue when the block is full.

There is also flow control logic which blocks the writer if it gets too far ahead of the reader. This flow control logic is where the bug happened.

Steps to reproduce:

  • first write a large string to your large string writer, triggering the first clause in the code below, in such a way that neither action inherently causes a flush, so both fdata and fctrl are false. Now you have unflushed data in your largeByteArrayWriter and an unflushed control item in your controlWriter
  • then write numerous small strings to your small string writer, triggering the "else" clause in this code, taking care that you fill the byteWriter but not the controlWriter. This will cause fdata to be true but fctrl to be false.
  • This leads to the situation where byteWriter has been flushed (because it filled), and controlWriter has been flushed (because of the else if (fdata)) but no one has flushed largeByteArrayWriter.
  • If you continue to call this in the same way you can keep flushing byteWriter and controlWriter but no one will flush largeByteArrayWriter
  • On the reader side, the flush of the controlWriter leads to the reader reading the control word telling it to expect a large string. But because the writer never flushes largeByteArrayWriter, the reader hangs forever waiting for a large string that will never come.
    [Edited: added these line to explain the deadlock]
  • Once the writer gets too far ahead of the reader, the flow control logic triggers, and the writer will block, waiting for the reader to catch up.
  • And now they're deadlocked, each waiting for the other.

The fix is when any of the queues flush, flush them all.

        if (size >= DenseStorageConstants.LARGE_THRESHOLD) {
            final byte[] data = new byte[size];
            bs.copyTo(data, 0);
            fdata = largeByteArrayWriter.addByteArray(data);
            fctrl = controlWriter.addInt(DenseStorageConstants.LARGE_BYTE_ARRAY_SENTINEL);
        } else {
            fdata = byteWriter.addBytes(bs);
            fctrl = controlWriter.addInt(size);
        }
        // [lengthy comment elided]
        if (fctrl) {
            byteWriter.flush();
            largeByteArrayWriter.flush();
        } else if (fdata) {
            controlWriter.flush();
        }

@kosak kosak requested a review from devinrsmith July 11, 2025 06:56
Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great writeup and fix.

@devinrsmith devinrsmith merged commit e387ce6 into deephaven:main Jul 11, 2025
3 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 11, 2025
@kosak kosak deleted the kosak_bug249 branch July 11, 2025 16:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants