Skip to content

Cloud-native vector formats: GeoParquet, FlatGeobuf, PMTiles, GeoArrow #380

@mikemcdougall

Description

@mikemcdougall

Context

The cloud-native geospatial community (cloudnativegeo.org) is converging on modern formats that are columnar, streamable, and cloud-storage friendly. These formats make vector feature serving faster, cheaper, and more interoperable with the modern data stack (DuckDB, pandas, BigQuery, Snowflake, Databricks).

Honua currently imports GeoJSON, Shapefile, GeoPackage, GPX, KML, and WKT. Adding cloud-native formats positions Honua in the modern data engineering ecosystem without entering the raster/earth observation market.

Scope

GeoParquet (Priority: highest)

The emerging standard for columnar vector data. 10-100x faster than GeoJSON/Shapefile for analytics. Supported by DuckDB, pandas/GeoPandas, BigQuery, Snowflake, CARTO, and Databricks.

  • Import: ingest GeoParquet files → PostGIS layers (preserving schema, CRS, geometry types)
  • Export: query results as GeoParquet download (f=parquet or Accept: application/vnd.apache.parquet)
  • Bulk export: full layer export as GeoParquet for data warehouse ingestion
  • Streaming read: read GeoParquet directly from S3/Azure Blob/GCS (range requests, no full download)
  • Partitioned GeoParquet: spatial partitioning for large datasets (Hive-style partitions by bbox)

FlatGeobuf

Streaming binary vector format — seek directly to features by spatial index without downloading the full file. Excellent for web clients and large datasets.

  • Import: FlatGeobuf files → PostGIS layers
  • Export: query results as FlatGeobuf (f=fgb)
  • HTTP range request serving: serve .fgb files with spatial index seeking (no server-side processing needed)
  • Direct web client consumption: MapLibre can read FlatGeobuf natively via HTTP range requests

PMTiles

Single-file vector tile archives — store an entire tileset in one file on S3/R2/Azure Blob. No tile server needed for static content.

  • Export: generate PMTiles archive from layer tile cache (POST /api/tiles/{layerId}/export?format=pmtiles)
  • Serve: read and serve tiles from PMTiles files on local disk or cloud storage
  • Seed to PMTiles: tile cache seeding outputs PMTiles for static hosting
  • Integration: Admin UI export wizard, serverless deployment of static tiles

GeoArrow

Apache Arrow-based in-memory format for zero-copy interop with analytics tools.

  • Export: query results as Arrow IPC stream (f=arrow)
  • Python SDK: return query results as GeoArrow for direct use in pandas/GeoPandas/DuckDB
  • ADBC (Arrow Database Connectivity): optional direct Arrow export from PostGIS (bypasses JSON serialization entirely)

Edition Gating

  • Community: import/export for all formats (adoption drivers)
  • Pro: cloud storage streaming read (S3/Azure/GCS range requests), partitioned GeoParquet, PMTiles seeding + cloud hosting, ADBC direct Arrow export

Why P2/Beta

GeoParquet is becoming the interchange format for spatial analytics. Data engineers increasingly expect it. PMTiles enables serverless tile hosting which aligns with Honua's deployment-anywhere story. These formats make Honua part of the modern data stack conversation.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/sdkSDK (JS, Python, .NET)area/serverCore server (protocols, query, edits)edition/communityCommunity edition (free)effort/XL🌲 XL: 2-4 days (major system change, architecture impact)enhancementNew feature or requestphase/BetaBeta scopepriority/P2⚡ Medium priority - important for phase completion, standard work

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions