Describe the bug, including details regarding any error messages, version, and platform.
The data_size - 4 check in this snippet does not have a rational justification and looks overly strict:
|
case Encoding::BIT_PACKED: { |
|
int num_bits = 0; |
|
if (MultiplyWithOverflow(num_buffered_values, bit_width_, &num_bits)) { |
|
throw ParquetException( |
|
"Number of buffered values too large (corrupt data page?)"); |
|
} |
|
num_bytes = static_cast<int32_t>(bit_util::BytesForBits(num_bits)); |
|
if (num_bytes < 0 || num_bytes > data_size - 4) { |
|
throw ParquetException("Received invalid number of bytes (corrupt data page?)"); |
|
} |
|
if (!bit_packed_decoder_) { |
|
bit_packed_decoder_ = |
|
std::make_unique<::arrow::bit_util::BitReader>(data, num_bytes); |
|
} else { |
|
bit_packed_decoder_->Reset(data, num_bytes); |
|
} |
|
return num_bytes; |
|
} |
It probably doesn't matter in most cases, as there are at least 4 bytes of encoded page values after the levels, but in some fringe cases (such as all/most values being null) there might not.
The original chance from data_size to data_size - 4 was made by me in #6848 and it was probably based on a misunderstanding, or just blindly copying the similar check for Encoding::RLE (which, unlike Encoding::BIT_PACKED, does have a 4-byte length header).
Component(s)
C++, Parquet
Describe the bug, including details regarding any error messages, version, and platform.
The
data_size - 4check in this snippet does not have a rational justification and looks overly strict:arrow/cpp/src/parquet/column_reader.cc
Lines 128 to 145 in 55587ef
It probably doesn't matter in most cases, as there are at least 4 bytes of encoded page values after the levels, but in some fringe cases (such as all/most values being null) there might not.
The original chance from
data_sizetodata_size - 4was made by me in #6848 and it was probably based on a misunderstanding, or just blindly copying the similar check forEncoding::RLE(which, unlikeEncoding::BIT_PACKED, does have a 4-byte length header).Component(s)
C++, Parquet