-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Context
The cloud-native geospatial community (cloudnativegeo.org) is converging on modern formats that are columnar, streamable, and cloud-storage friendly. These formats make vector feature serving faster, cheaper, and more interoperable with the modern data stack (DuckDB, pandas, BigQuery, Snowflake, Databricks).
Honua currently imports GeoJSON, Shapefile, GeoPackage, GPX, KML, and WKT. Adding cloud-native formats positions Honua in the modern data engineering ecosystem without entering the raster/earth observation market.
Scope
GeoParquet (Priority: highest)
The emerging standard for columnar vector data. 10-100x faster than GeoJSON/Shapefile for analytics. Supported by DuckDB, pandas/GeoPandas, BigQuery, Snowflake, CARTO, and Databricks.
- Import: ingest GeoParquet files → PostGIS layers (preserving schema, CRS, geometry types)
- Export: query results as GeoParquet download (
f=parquetorAccept: application/vnd.apache.parquet) - Bulk export: full layer export as GeoParquet for data warehouse ingestion
- Streaming read: read GeoParquet directly from S3/Azure Blob/GCS (range requests, no full download)
- Partitioned GeoParquet: spatial partitioning for large datasets (Hive-style partitions by bbox)
FlatGeobuf
Streaming binary vector format — seek directly to features by spatial index without downloading the full file. Excellent for web clients and large datasets.
- Import: FlatGeobuf files → PostGIS layers
- Export: query results as FlatGeobuf (
f=fgb) - HTTP range request serving: serve .fgb files with spatial index seeking (no server-side processing needed)
- Direct web client consumption: MapLibre can read FlatGeobuf natively via HTTP range requests
PMTiles
Single-file vector tile archives — store an entire tileset in one file on S3/R2/Azure Blob. No tile server needed for static content.
- Export: generate PMTiles archive from layer tile cache (
POST /api/tiles/{layerId}/export?format=pmtiles) - Serve: read and serve tiles from PMTiles files on local disk or cloud storage
- Seed to PMTiles: tile cache seeding outputs PMTiles for static hosting
- Integration: Admin UI export wizard, serverless deployment of static tiles
GeoArrow
Apache Arrow-based in-memory format for zero-copy interop with analytics tools.
- Export: query results as Arrow IPC stream (
f=arrow) - Python SDK: return query results as GeoArrow for direct use in pandas/GeoPandas/DuckDB
- ADBC (Arrow Database Connectivity): optional direct Arrow export from PostGIS (bypasses JSON serialization entirely)
Edition Gating
- Community: import/export for all formats (adoption drivers)
- Pro: cloud storage streaming read (S3/Azure/GCS range requests), partitioned GeoParquet, PMTiles seeding + cloud hosting, ADBC direct Arrow export
Why P2/Beta
GeoParquet is becoming the interchange format for spatial analytics. Data engineers increasingly expect it. PMTiles enables serverless tile hosting which aligns with Honua's deployment-anywhere story. These formats make Honua part of the modern data stack conversation.
References
- GeoParquet spec: https://geoparquet.org/
- FlatGeobuf: https://flatgeobuf.org/
- PMTiles: https://protomaps.com/docs/pmtiles
- GeoArrow: https://geoarrow.org/
- Cloud-Native Geospatial Foundation: https://cloudnativegeo.org/
- Epic: GeoETL — spatial extract-transform-load pipelines #361 GeoETL (these formats as extract/load targets)
- Epic: Multi-database support — SQL Server, MySQL, and cloud spatial databases #362 Multi-database (GeoParquet as interchange between databases)