Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion content/en/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,10 @@ title: Parquet
<a class="btn btn-lg btn-secondary me-3 mb-4" href="/blog/">
Download <i class="fab fa-github ms-2 "></i>
</a>
<p class="lead mt-5">Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.</p>
<p class="lead mt-5">
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval.
It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming language and analytics tools.
</p>
{{< blocks/link-down color="info" >}}
{{< /blocks/cover >}}

Expand Down
6 changes: 3 additions & 3 deletions content/en/docs/Overview/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ description: >
All about Parquet.
---

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval.
It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming language and analytics tools.
Copy link
Copy Markdown
Collaborator

@vinooganesh vinooganesh May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we mean for this to say "high performance compression" or is it "high performance, compression"? I think it may be the latter. Or maybe "It provides performant compression and encoding schemes..." I was thinking the first versions sound too much like the compression tool rather than the format

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean for the comma or lack there of to carry any additional semantic meaning. I am happy to put a comma there if you like

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No really strong feelings, was just wondering if there was a subtextual focus intended


This documentation contains information about both the [parquet-mr](https://github.com/apache/parquet-mr) and [parquet-format](https://github.com/apache/parquet-format) repositories.


### parquet-format

The parquet-format repository hosts the official specification of the Apache Parquet file format, defining how data is structured and stored. This specification, along with Thrift metadata definitions and other crucial components, is essential for developers to effectively read and write Parquet files. The parquet-format project specifically contains the format specifications needed to understand and properly utilize Parquet files.
Expand Down Expand Up @@ -43,4 +43,4 @@ Here is a non-exhaustive list of Parquet implementations:
* [cuDF](https://github.com/rapidsai/cudf)
* [Apache Impala](https://github.com/apache/impala)
* [DuckDB](https://github.com/duckdb/duckdb)
* [fastparquet, a Python implementation of the Apache Parquet format](https://github.com/dask/fastparquet)
* [fastparquet, a Python implementation of the Apache Parquet format](https://github.com/dask/fastparquet)
2 changes: 1 addition & 1 deletion static/doap_Parquet.rdf
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<homepage rdf:resource="http://parquet.apache.org" />
<asfext:pmc rdf:resource="http://parquet.apache.org" />
<shortdesc>Apache Parquet is a general-purpose columnar storage format.</shortdesc>
<description>Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Parquet is available in multiple languages including Java, C++, and Python.</description>
<description>Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming language and analytics tools.</description>
<bug-database rdf:resource="https://issues.apache.org/jira/browse/PARQUET" />
<mailing-list rdf:resource="https://parquet.apache.org/community/" />
<download-page rdf:resource="https://parquet.apache.org/blog/2023/05/18/1.13.1/" />
Expand Down