Skip to content

Epic: GeoETL — spatial extract-transform-load pipelines #361

@mikemcdougall

Description

@mikemcdougall

Context

FME (Safe Software) dominates spatial ETL at ~$10K+/year per license. Most organizations need repeatable data pipelines: pull from sources, clean/transform, load into Honua. Today Honua has one-shot import (GeoJSON, Shapefile, Esri REST). This epic adds scheduled, repeatable, multi-source ETL.

Scope

Extract (Sources)

  • File sources: GeoJSON, Shapefile, GeoPackage, CSV (with coordinates), KML, GPX, GML
  • Database sources: PostGIS (remote), SQL Server (geometry/geography), MySQL (spatial)
  • API sources: Esri REST feature services, OGC WFS, OGC API Features, generic REST/JSON
  • Cloud sources: S3/Azure Blob/GCS file watchers
  • Streaming sources: webhook receivers, MQTT (IoT sensors)

Transform

  • Geometry: reproject, simplify, buffer, clip to boundary, validate/repair
  • Attributes: rename, type cast, concatenate, split, lookup/join, regex extract
  • Filtering: spatial filter (clip to AOI), attribute filter, dedup (spatial + attribute)
  • Enrichment: reverse geocode, spatial join with reference layers, address standardization
  • Quality: null fill, outlier detection, geometry validation, coordinate precision

Load (Destinations)

  • Honua layers (create or append/upsert)
  • Honua attachments (linked documents/photos)
  • External PostGIS databases
  • File export (GeoJSON, GeoPackage, Shapefile)

Pipeline Orchestration

  • Declarative pipeline definitions (YAML/JSON)
  • Scheduled execution (cron)
  • Event-triggered (file upload, webhook, CDC event from Feature edit change notifications: webhook events + replay cursor #316)
  • Pipeline versioning and rollback
  • Execution history, logs, row-level error tracking
  • Admin UI: pipeline builder, execution monitor, error inspector
  • Dry run mode: preview transforms without writing

Edition Gating

  • Community: one-shot import (existing capability) — never degraded
  • Pro: scheduled pipelines, multi-source extract, full transform library, pipeline orchestration
  • Enterprise: streaming sources, custom transform plugins, cross-tenant pipelines

Differentiator vs FME

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/serverCore server (protocols, query, edits)edition/proPro edition featureeffort/XL🌲 XL: 2-4 days (major system change, architecture impact)enhancementNew feature or requestphase/GAGA scopepriority/P3📋 Low priority - nice to have in phase, can be deferred

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions