Skip to content

Included Benchmarks

Yiqun (Ethan) Zhang edited this page Feb 24, 2026 · 1 revision

PBench ships with ready-to-run benchmarks in the benchmarks/ directory. Each benchmark includes SQL queries, stage configuration files for various scale factors and storage formats, and (where applicable) data generation scripts.

TPC-H

TPC-H is a decision-support benchmark with 22 queries over a relational schema of 8 tables (orders, lineitems, parts, suppliers, etc.).

Directory: benchmarks/tpch/

Queries

22 SQL queries in queries/ (query_01.sql – query_22.sql).

Stage files

File Description
tpch.json All 22 queries with expected row counts (SF1000)

Scale factor configs

File Scale Factor Catalog / Schema
sf1.json 1 GB TPCH connector, schema sf1
sf10.json 10 GB Hive, tpch_sf10_parquet
sf100.json 100 GB Hive, tpch_sf100_parquet
sf1k.json 1 TB Hive, tpch_sf1000_parquet
sf10k.json 10 TB Hive, tpch_sf10000_parquet
sf100k.json 100 TB Hive, tpch_sf100000_parquet

Format variants

File Format Notes
sf1k_ice.json Iceberg optimizer_use_histograms: true
sf1k_ice_par.json Iceberg (partitioned)
sf1k_delta_symlink.json Delta (symlink)
sf1k_delta_symlink_par.json Delta (symlink, partitioned)
sf100-trino.json Iceberg (Trino) Includes all 22 queries inline, save_json: true

Throughput streams

42 stream files in streams/ (stream_01.json – stream_42.json). Each stream runs all 22 queries in a different order with start_on_new_client: true, enabling concurrent throughput testing.

Example usage

# Power test: 1 cold + 2 warm runs of all 22 queries at SF1000
pbench run -s http://localhost:8080 -o results \
  benchmarks/tpch/tpch.json benchmarks/tpch/sf1k.json \
  benchmarks/java_oss.json benchmarks/c1w2.json

# Throughput test: 4 concurrent streams
pbench run -s http://localhost:8080 -o results \
  benchmarks/tpch/streams/stream_{01,02,03,04}.json benchmarks/tpch/sf1k.json \
  benchmarks/java_oss.json benchmarks/c1w2.json

TPC-DS

TPC-DS is a decision-support benchmark with 99 queries over a retail sales schema of 24 tables.

Directory: benchmarks/tpc-ds/

Queries

99 SQL queries plus 5 ordered variants in queries/. The ordered variants (query_36_ordered, query_65_ordered, query_71_ordered, query_73_ordered, query_77_ordered) add deterministic ORDER BY clauses for result comparison.

Stage files

File Description
ds_power.json All 99 queries with expected row counts at multiple scale factors
ds_full.json All 104 queries (99 + 5 ordered variants)
ds_atomic.json 44 queries testing individual SQL operations (joins, aggregations, set operations)
ds_subset.json Subset of queries
ds_rand5.json Randomly execute 5 queries from the pool
ds_rand15m.json Random execution for 15 minutes
ds_rand50.json Randomly execute 50 queries

Scale factor configs

File Scale Factor Catalog / Schema
sf1.json 1 GB TPC-DS connector, schema sf1
sf10.json 10 GB Hive, tpcds_sf10_parquet_varchar
sf100.json 100 GB Hive, tpcds_sf100_parquet_v2
sf1k.json 1 TB Hive, tpcds_sf1000_parquet_v2
sf10k.json 10 TB Hive, tpcds_sf10000_parquet
sf30k.json 30 TB Hive, tpcds_sf30000_parquet
sf100k.json 100 TB Hive, tpcds_sf100000_parquet

Format variants

File Format Notes
sf1k_ice.json Iceberg
sf1k_ice_par.json Iceberg (partitioned)
sf1k_ice_uncompressed.json Iceberg (uncompressed)
sf10k_ice.json Iceberg
sf10k_ice_par.json Iceberg (partitioned)
sf30k_ice.json Iceberg
sf1k_par.json Parquet (partitioned)
sf10k_par.json Parquet (partitioned)
sf100k_par.json Parquet (partitioned)
sf10k_dwrf.json DWRF (ORC)

Throughput streams

23 stream files in streams/ (stream_01.json – stream_23.json). Each stream runs all 99 queries in a different order for concurrent throughput testing.

Example usage

# Power test at SF10000
pbench run -s http://localhost:8080 -o results \
  benchmarks/tpc-ds/ds_power.json benchmarks/tpc-ds/sf10k.json \
  benchmarks/native_oss.json benchmarks/c1w2.json

# Random 50 queries at SF1000
pbench run -s http://localhost:8080 -o results \
  benchmarks/tpc-ds/ds_rand50.json benchmarks/tpc-ds/sf1k.json \
  benchmarks/native_oss.json benchmarks/c1w2.json

ClickBench

ClickBench is an OLAP benchmark with 43 queries over a single wide hits table of web analytics data.

Directory: benchmarks/clickbench/

Queries

43 SQL queries in queries/ (query_01.sql – query_43.sql).

Stage files

File Description
clickbench.json All 43 queries with expected row counts; sets offset_clause_enabled: true

Schema: clickbench_parquet.

Example usage

pbench run -s http://localhost:8080 -o results \
  benchmarks/clickbench/clickbench.json \
  benchmarks/java_oss.json benchmarks/c1w2.json

IMDB (Join Order Benchmark)

The Join Order Benchmark (JOB) uses the IMDB dataset to evaluate join ordering and cardinality estimation. It has 113 queries with complex multi-way joins.

Directory: benchmarks/imdb/

Queries

113 SQL queries in queries/, named by group and variant (1a.sql, 1b.sql, ..., 33c.sql).

Stage files

File Description
imdb.json All 113 queries; schema imdb

Example usage

pbench run -s http://localhost:8080 -o results \
  benchmarks/imdb/imdb.json \
  benchmarks/java_oss.json benchmarks/c1w2.json

Shared Configuration Files

Top-level JSON files in benchmarks/ configure engine settings and execution parameters. They are designed to be composed with benchmark stage files via multiple -f arguments or positional args.

Engine configs

These set catalog, session parameters, and pushdown settings for different Presto/Trino variants:

File Engine Catalog
java_oss.json Java OSS Hive
native_oss.json Native OSS Hive
java_blueray.json Java BlueRay Hive
native_blueray.json Native BlueRay Hive
java_trino.json Trino Hive
java_oss_glue.json Java OSS Glue
native_oss_glue.json Native OSS Glue
java_blueray_glue.json Java BlueRay Glue
native_blueray_glue.json Native BlueRay Glue

Execution configs

File Description
c1w2.json 1 cold run, 2 warm runs
abort_on_error.json Stop on first query failure
save_output.json Save query result output
save_json.json Save query info JSON
save_colmd.json Save column metadata

Composing configs

Stage files are merged left-to-right, so later files override earlier ones. A typical invocation combines a benchmark, a scale factor, an engine config, and execution settings:

pbench run -s http://localhost:8080 -o results \
  benchmarks/tpc-ds/ds_power.json \   # benchmark + queries
  benchmarks/tpc-ds/sf1k.json \       # scale factor / schema
  benchmarks/native_oss.json \        # engine + session params
  benchmarks/c1w2.json                # execution settings

Utility Scripts

benchmarks/scripts/ contains Python utilities for cache management and database connectivity:

Script Description
presto_utils.py Presto/Trino HTTPS connection and query helpers
mysql_utils.py MySQL connection and query helpers
system_utils.py SSH remote command execution (Paramiko)
cache_cleaning_coordinator.py Clear Hive/Iceberg metadata caches on coordinator
cache_cleaning_workers.py Clear SSD, page, and memory caches on workers via SSH

Clone this wiki locally