GH-48204: [C++][Parquet] Fix Column Reader & Writer logic to enable Parquet DB support on s390x#48205
GH-48204: [C++][Parquet] Fix Column Reader & Writer logic to enable Parquet DB support on s390x#48205Vishwanatha-HD wants to merge 1 commit intoapache:mainfrom
Conversation
|
|
82d9390 to
9959543
Compare
cpp/src/parquet/column_writer.h
Outdated
| auto last_day_nanos = last_day_units * NanosecondsPerUnit; | ||
| #if ARROW_LITTLE_ENDIAN | ||
| // impala_timestamp will be unaligned every other entry so do memcpy instead | ||
| // of assign and reinterpret cast to avoid undefined behavior. | ||
| std::memcpy(impala_timestamp, &last_day_nanos, sizeof(int64_t)); | ||
| #else | ||
| (*impala_timestamp).value[0] = static_cast<uint32_t>(last_day_nanos); | ||
| (*impala_timestamp).value[1] = static_cast<uint32_t>(last_day_nanos >> 32); |
There was a problem hiding this comment.
Can we use the following instead of #if?
auto last_day_nanos = last_day_units * NanosecondsPerUnit;
auto last_day_nanos_little_endian = ::arrow::bit_util::ToLittleEndian(last_day_nanos);
std::memcpy(impala_timestamp, &last_day_nanos_little_endian, sizeof(int64_t));There was a problem hiding this comment.
Hi @kou.. I tried to run the testcase with the changes that you are referring above.. Unfortunately, its not working.. I am seeing following testcase failures on big-endian (s390x) machine..
[ FAILED ] 5 tests, listed below:
[ FAILED ] TestArrowReadWrite.UseDeprecatedInt96
[ FAILED ] TestArrowReadWrite.DownsampleDeprecatedInt96
[ FAILED ] TestArrowReadWrite.ParquetVersionTimestampDifferences
[ FAILED ] ArrowReadWrite.NestedRequiredOuterOptional
[ FAILED ] TestImpalaConversion.ArrowTimestampToImpalaTimestamp
5 FAILED TESTS
These testcases are part of "parquet-arrow-reader-writer-test" testsuites..
cpp/src/parquet/column_reader.cc
Outdated
| #if ARROW_LITTLE_ENDIAN | ||
| if (num_bytes < 0 || num_bytes > data_size - 4) { | ||
| #else | ||
| if (num_bytes < 0 || num_bytes > data_size) { | ||
| #endif |
There was a problem hiding this comment.
Thanks for the ping @kou . I've re-read through this code and I now think the original change was a mistake. I'll submit a separate issue/PR to fix it.
There was a problem hiding this comment.
@Vishwanatha-HD Can you rebase/merge from git main and remove this change?
|
Working through this one, I’m reminded how many odd little corners show up when Arrow’s layout meets Parquet’s expectations — especially around levels, decimals, and the legacy INT96 bits. Looking at the pieces that overlap with the work I’ve been doing, the overall direction makes sense. A few notes from what I’ve seen on real s390x hardware: • BIT_PACKED level headers • Decimal serialization • Half-floats in FLBA • Paging / DoInBatches • INT96 (Impala timestamp) None of this is blocking — just trying to pass along the details I’ve seen crop up when running the full parquet-encode → parquet-decode cycle on big-endian hardware. |
| if constexpr (std::is_same_v<ArrowType, ::arrow::Decimal64Type>) { | ||
| *p++ = ::arrow::bit_util::ToBigEndian(u64_in[0]); | ||
| } else if constexpr (std::is_same_v<ArrowType, ::arrow::Decimal128Type>) { | ||
| #if ARROW_LITTLE_ENDIAN |
There was a problem hiding this comment.
Please take a step back and read the comments above:
// Requires a custom serializer because decimal in parquet are in big-endian
// format. Thus, a temporary local buffer is required.If we're on a big-endian system, this entire code is unnecessary and we can just use the FIXED_LEN_BYTE_ARRAY SerializeFunctor.
There was a problem hiding this comment.
@pitrou.. Thanks for your review comments.. I will probably work on this change in the next pass. Thanks..
There was a problem hiding this comment.
@Vishwanatha-HD Please don't resolve discussions until they are actually resolved. This one hasn't been addressed.
There was a problem hiding this comment.
@pitrou.. Ok.. sure.. thanks..
@Vishwanatha-HD Please don't resolve discussions until they are actually resolved. This one hasn't been addressed.
There was a problem hiding this comment.
@pitrou.. I have rebased this to git main and removed the below piece of code..
-#if ARROW_LITTLE_ENDIAN
- if (num_bytes < 0 || num_bytes > data_size - 4) {
-#else
if (num_bytes < 0 || num_bytes > data_size) { ----------->>>>> Only retaining this line now..
-#endif
There was a problem hiding this comment.
@Vishwanatha-HD Sure, but you haven't addressed the initial comment.
There was a problem hiding this comment.
@pitrou.. I can come up with an optimized logic as below.. Please let me know if you are fine with it..
FixedLenByteArray FixDecimalEndianness(const uint8_t* in, int64_t offset) {
auto out = reinterpret_cast<const uint8_t*>(scratch) + offset;
constexpr int32_t kSize =
std::is_same_v<ArrowType, ::arrow::Decimal32Type> ? 4 :
std::is_same_v<ArrowType, ::arrow::Decimal64Type> ? 8 :
std::is_same_v<ArrowType, ::arrow::Decimal128Type> ? 16 :
std::is_same_v<ArrowType, ::arrow::Decimal256Type> ? 32 :
0;
static_assert(kSize != 0, "Unsupported decimal type");
uint8_t* out_bytes = reinterpret_cast<uint8_t*>(scratch);
// Arrow guarantees: ByteSwapBuffer swaps only on little-endian.
// On big-endian it becomes memcpy.
::arrow::bit_util::ByteSwapBuffer(out_bytes, in, kSize);
scratch += kSize;
return FixedLenByteArray(out);
}
There was a problem hiding this comment.
I will write again what I said above:
If we're on a big-endian system, this entire code is unnecessary and we can just use the FIXED_LEN_BYTE_ARRAY SerializeFunctor.
Of course, I might be missing something and some dedicated code might still be necessary. But I don't think that is.
|
I'm frankly surprised that so few changes are required, given that Parquet C++ was never successfully tested on BE systems before. @Vishwanatha-HD Did you try to read the files in https://github.com/apache/parquet-testing/tree/master/data and check the contents were properly decoded? |
Hi @pitrou.. |
|
Thanks @Vishwanatha-HD , and thanks for splitting up like this. |
@pitrou.. Sure.. my pleasure.. Thanks alot for spending so much time and reviewing the code change and give your comments.. I appreciate it.. !! |
9959543 to
dec2111
Compare
Vishwanatha-HD
left a comment
There was a problem hiding this comment.
I have addressed all the review comments.. Thanks..
f05e139 to
dce3855
Compare
…support on s390x
dce3855 to
a7c1c7e
Compare
| return SerializeLittleEndianValues(array, array.raw_values(), out); | ||
| #else | ||
| const uint16_t* values = array.raw_values(); | ||
| const int64_t length = array.length(); |
There was a problem hiding this comment.
Can we create or do versioning SerializeLittleEndianValues() for big-endian, which calls ::arrow::bit_util::ToLittleEndian() when &values[i] is called?
Then we can use it instead of storing converted values to a vector? This can reduce memory accesses.
There was a problem hiding this comment.
We already have endian-fixing logic for FLBA in the decimal serializer above. It should be factored out and reused instead of reinventing something different here.
| (*impala_timestamp).value[0] = static_cast<uint32_t>(last_day_nanos); | ||
| (*impala_timestamp).value[1] = static_cast<uint32_t>(last_day_nanos >> 32); |
There was a problem hiding this comment.
Why is this not changing the endianness of values?
There was a problem hiding this comment.
@pitrou.. If I am understanding correctly, this logic is trying to put the lower 4-bytes (32-bits) of "last_day_nanos" into value[0] and upper 4-bytes (32-bits) into value[1].. "memcpy of 8-bytes (64-bits) will keep the values as is.. Hence we wont be able to use "memcpy" on BE machines.. Hope I am making things clear here..
There was a problem hiding this comment.
Sure, but the individual 32-bit values still need to be byte-swapped, right?
There was a problem hiding this comment.
@kou.. I dont think that is necessary.. Otherwise the testcase would have failed.. Its just the way the 3 uint32_t values of "last_day_nanos" are arranged into the "(*impala_timestamp).value" variable..
|
@Vishwanatha-HD Could you please fix the number |
Hi @kiszk.. |
Rationale for this change
This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes the column reader & writer logic. Column Reader & Writer are the main part of most of the parquet & arrow-parquet testcases.
What changes are included in this PR?
The fix includes changes to following files:
cpp/src/parquet/column_reader.cc
cpp/src/parquet/column_writer.cc
cpp/src/parquet/column_writer.h
Are these changes tested?
Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.
Are there any user-facing changes?
No
GitHub main Issue link: #48151