Skip to content

[C++] How to correctly accelerate Parquet read using the FileReader::RowGroup(i)->ReadTable interface #42097

@renyd123

Description

@renyd123

Describe the usage question you have. Please include as many useful details as possible.

I encountered a performance issue when calling the FileReader::RowGroup(i)->ReadTable interface in a multi-threaded environment. For a RowGroup with 1 million rows, when the number of threads is 1, FileReader::RowGroup(i)->ReadTable takes around 40ms. However, as the number of threads increases, the execution time also increases gradually. When the number of threads reaches 16, the execution time reaches around 250ms.
This is reflected in the overall file parsing performance, where using 16 threads is only about 5 times faster than using a single thread.
Is this issue caused by locking or I/O in the interface? Are there any other methods for multi-threaded parsing?

Component(s)

C++

Metadata

Metadata

Assignees

No one assigned

    Labels

    Component: C++Status: stale-warningIssues and PRs flagged as stale which are due to be closed if no indication otherwiseType: usageIssue is a user question

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions