Skip to content

yield_on_full + small output buffer: false NoProgress loses data #68

@lilith

Description

@lilith

Summary

When yield_on_full_buffer(true) is used with an output buffer smaller than a decoded code value, the Classic decoder returns NoProgress with consumed_in=0, consumed_out=0 while the internal buffer and bit buffer still contain decodable data. The caller interprets this as "done" and stops, silently losing the remaining output.

Reproduction

use weezl::{decode::{Configuration, TableStrategy}, encode, BitOrder, LzwStatus};

let data = vec![42u8; 8];
let encoded = encode::Encoder::new(BitOrder::Lsb, 8).encode(&data).unwrap();

let mut dec = Configuration::new(BitOrder::Lsb, 8)
    .with_yield_on_full_buffer(true)
    .with_table_strategy(TableStrategy::Classic)
    .build();

let mut result = Vec::new();
let mut inp = encoded.as_slice();
loop {
    let mut tmp = [0u8; 1]; // 1-byte output buffer
    let r = dec.decode_bytes(inp, &mut tmp);
    inp = &inp[r.consumed_in..];
    result.extend_from_slice(&tmp[..r.consumed_out]);
    match r.status {
        Ok(LzwStatus::Done) => break,
        Ok(LzwStatus::NoProgress) if r.consumed_in == 0 && r.consumed_out == 0 => break,
        Ok(_) => {}
        Err(_) => break,
    }
}
assert_eq!(result.len(), 8); // FAILS: result.len() == 1

Chunked has the same bug. KwKwK-heavy data (repeated single byte) triggers it most easily because every code is next_code, forcing the burst loop to exit after one code and route through the internal buffer.

Root cause

In DecodeState::advance(), when the internal buffer is empty at the start of a call, line 957 unconditionally sets status = NoProgress:

} else if remain.is_empty() {
    status = Ok(LzwStatus::NoProgress);
    have_yet_to_decode_data = true;
}

The burst loop then extracts a code from the bit buffer (which still has codes from an earlier greedy refill_bits) and writes it to the internal buffer (is_in_buffer = true). This produces consumed_out = 0 because data went to the internal buffer, not to out. At the end of advance(), the guard that corrects NoProgress:

if o_in > inp.len() {
    if let Ok(LzwStatus::NoProgress) = status {
        status = Ok(LzwStatus::Ok);
    }
}

...doesn't fire because o_in == inp.len() (no new input was consumed — the bits were already in the bit buffer from a previous call's greedy refill). So the false NoProgress propagates to the caller.

Suggested fixes

Option A: Fix the guard (correct for all buffer sizes)

At the end of advance(), also check whether data was written to the internal buffer:

let made_progress = o_in > inp.len()
    || !self.buffer.buffer().is_empty();
if made_progress {
    if let Ok(LzwStatus::NoProgress) = status {
        status = Ok(LzwStatus::Ok);
    }
}

This is safe because if the buffer has data, the next advance() call will drain it and produce consumed_out > 0.

Option B: Document minimum buffer size + debug_assert

LZW code values can be up to 3839 bytes long (KwKwK worst case with MAX_ENTRIES=4096). If the output buffer is always ≥ 4096 bytes, every code's output fits directly in out and the is_in_buffer path never triggers with yield_on_full.

This can be documented on with_yield_on_full_buffer:

/// # Buffer size requirement
///
/// When this option is enabled, the output buffer passed to
/// `decode_bytes` should be at least 4096 bytes. LZW codes can
/// decode to up to 3839 bytes; if the output buffer is smaller
/// than a code's decoded value, the Classic decoder may return
/// a spurious `NoProgress`.

Plus a debug assertion in advance():

debug_assert!(
    !CgC::YIELD_ON_FULL || out.len() >= MAX_ENTRIES,
    "yield_on_full requires output buffer >= {} bytes, got {}",
    MAX_ENTRIES, out.len()
);

image-tiff always passes strip-sized buffers (typically 8 KiB–1 MiB), so this constraint is already met in practice.

Note on Streaming strategy

The TableStrategy::Streaming decoder (#69) handles this correctly for all buffer sizes via its pending buffer — no minimum buffer requirement, no data loss. This bug only affects Classic and Chunked.

Impact

  • TIFF is unaffected (strips are always large buffers).
  • GIF is unaffected in practice (decoders use reasonably sized buffers).
  • Any caller using yield_on_full_buffer(true) with small or 1-byte output buffers will silently truncate output. This includes custom Read::read adapters that forward small caller-provided buffers without internal buffering.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions