Skip to content

[C++] Is it possible to serialize row groups in parallel? #43322

@Zechariah2001

Description

@Zechariah2001

Describe the usage question you have. Please include as many useful details as possible.

I'm trying to lower the time cost of writing RecordBatch into a single parquet file, the current method is using parquet::arrow::FileWriter::WriteRecordBatch with set_use_threads(true), and the result is not very satisfying.

threads CPU_usage

I know that set_use_threads(true) is serializing row groups in columns, and I still need to encode row groups one by one. So I wonder if it's possible to encode different row groups in parallel?

Component(s)

C++

Metadata

Metadata

Assignees

No one assigned

    Labels

    Component: C++Status: stale-warningIssues and PRs flagged as stale which are due to be closed if no indication otherwiseType: usageIssue is a user question

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions