Describe the usage question you have. Please include as many useful details as possible.
I'm trying to lower the time cost of writing RecordBatch into a single parquet file, the current method is using parquet::arrow::FileWriter::WriteRecordBatch with set_use_threads(true), and the result is not very satisfying.
I know that set_use_threads(true) is serializing row groups in columns, and I still need to encode row groups one by one. So I wonder if it's possible to encode different row groups in parallel?
Component(s)
C++
Describe the usage question you have. Please include as many useful details as possible.
I'm trying to lower the time cost of writing RecordBatch into a single parquet file, the current method is using
parquet::arrow::FileWriter::WriteRecordBatchwithset_use_threads(true), and the result is not very satisfying.I know that
set_use_threads(true)is serializing row groups in columns, and I still need to encode row groups one by one. So I wonder if it's possible to encode different row groups in parallel?Component(s)
C++