Skip to content

djfrancesco/awesome-parquet

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Parquet Awesome

Parquet Logo

Useful resources for using the Parquet format

Contents

Libraries

C GLib

  • Arrow GLib - A wrapper library for Arrow C++.
  • DuckDB - An in-process database library that supports reading and writing Parquet files.

C++

  • Apache Arrow C++ - A library with support for reading and writing Parquet files.
  • DuckDB C++ API - Internal DuckDB C++ API.
  • libcudf - A GPU-accelerated DataFrame library for tabular data processing.

Dart

Go

  • duckdb-go - DuckDB Go client.
  • parquet - Official Go implementation of Apache Arrow.
  • parsyl/parquet - A Go library for reading and writing Parquet files.

Java

  • cudf - Java bindings for cudf, to be able to process large amounts of data on a GPU.
  • duckdb-java - DuckDB Java/JDBC API.
  • parquet-carpet - A Java library for serializing and deserializing Parquet files efficiently using Java records.
  • parquet-java - A Java implementation of the Parquet format, owned by the Apache Software Foundation.

JavaScript

  • duckdb-wasm - WebAssembly version of DuckDB.
  • duckdb-node-neo - DuckDB Node.js client.
  • hyparquet - A lightweight, dependency-free, pure JavaScript library for parsing Apache Parquet files.
  • parquet-wasm - WebAssembly bindings to read and write the Apache Parquet format to and from Apache Arrow using the Rust parquet and arrow crates.

Julia

  • DuckDB - Official DuckDB Julia package.
  • Parquet.jl - Julia implementation of Parquet columnar file format reader.

.NET

PHP

Python

  • duckdb-python - DuckDB Python client.
  • pyarrow - A Python API for functionality provided by the Arrow C++ libraries, along with tools for Arrow integration and interoperability with Pandas, NumPy, and other software in the Python ecosystem.
  • pylibcudf - A lightweight Cython interface to libcudf that provides near-zero overhead for GPU-accelerated data processing in Python.
  • fastparquet - A Python implementation of the Parquet columnar file format.
  • Datanomy - Terminal-based tool for inspecting and understanding data files.

R

  • arrow - The arrow package provides an Arrow C++ backend to dplyr, and access to the Arrow C++ library through familiar base R and tidyverse functions, or R6 classes.
  • duckdb-r - DuckDB R package.
  • nanoparquet - A reader and writer for a common subset of Parquet files.

Ruby

  • Red Parquet - The Ruby bindings of Apache Parquet, based on GObject Introspection.

Rust

  • datafusion - An extensible query engine written in Rust that can read/write Parquet files using SQL or a DataFrame API.
  • duckdb-rs - DuckDB Rust client.
  • parquet - The official Native Rust implementation of Apache Parquet, part of the Apache Arrow project.
  • Polars - A DataFrame interface on top of an OLAP Query Engine that supports reading and writing Parquet files, with bindings for Python.

Swift

Tools

Command-line

  • DataFusion CLI - A single, dependency-free executable that can read and write Parquet files, with a SQL interface.
  • Datanomy - A terminal-based tool for visualizing a Parquet file's metadata and structure.
  • DuckDB CLI - A single, dependency-free executable that can read and write Parquet files, with a SQL interface.
  • parqeye - Peek inside Parquet files right from your terminal.
  • parquet-tools - Python-based CLI tool for exploring parquet files (part of Apache Arrow).
  • parquet-cli - Java-based CLI tool for exploring parquet files.
  • parquet-cli-standalone - A JAR file for the parquet-cli tool which can be run without any dependencies.
  • parquet-grep - A CLI tool to search for strings in Parquet files.
  • Spark - A multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
  • Tabiew - A lightweight TUI application to view and query tabular data files, such as CSV, TSV, and parquet.

Desktop applications

  • Pink Parquet - A free and open-source, user-friendly viewer for Parquet files for Windows.
  • Tad - An application for viewing and analyzing tabular data sets.

Plugins

  • nf-parquet - A Nextflow plugin able to read and write parquet files.

Web

  • ChatDB - Online tools for viewing and converting from and to Parquet files.
  • DataConverter.io - Online tools for viewing, converting, and transforming Parquet files.
  • Datasette - A tool to explore datasets, with support for reading Parquet files.
  • Onyxia Data Explorer - A web-based tool to explore Parquet files in the browser.
  • Parquet File Visualizer - Claude-code generated parquet metadata vizualizer that runs in your browser.
  • Parquet Viewer - View parquet files online.
  • Quak - A scalable data profiler for quickly scanning large tables.

Resources

Blogs

Documentation

  • Parquet - The specification for Apache Parquet and Apache Thrift definitions to read and write Parquet metadata.
  • Apache Parquet Documentation - The official documentation for Apache Parquet.

Educative resources

  • ssphub - Un atelier de l'Insee illustrant l'utilisation des donnĂ©es du recensement 🇫🇷 diffusĂ©es au format Parquet.

Tests

Related formats

  • F3 - A data file format that is designed with efficiency, interoperability, and extensibility in mind.
  • GeoParquet - Specification for storing geospatial vector data (point, line, polygon) in Parquet.
  • Iceberg - A high-performance format for huge analytic tables, that supports Parquet as one of its storage formats.
  • Lance - Modern columnar data format for ML and LLMs.
  • Nimble - File format for storage of large columnar datasets.
  • ORC - Self-describing type-aware columnar file format designed for Hadoop workloads.
  • Vortex - A columnar file format designed for high-performance data processing.

Contributing

Contributions welcome! Read the contribution guidelines first.

About

Useful resources for using the Parquet format

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published