lazysf provides a dplyr backend for any vector data source readable by
GDAL. It uses gdalraster::GDALVector to talk to GDAL and dbplyr for
SQL translation, giving you lazy evaluation of spatial data through
familiar dplyr verbs.
Vector data sources — files, URLs, databases, cloud storage — are
accessed through a DBI connection. Nothing is read into memory until you
call collect().
library(lazysf)
library(dplyr)
f <- system.file("extdata/nc.gpkg", package = "lazysf", mustWork = TRUE)
lf <- lazysf(f)
lf
#> Warning in new_CppObject_xp(fields$.module, fields$.pointer, ...): SpatiaLite
#> is not available
#> # Source: table<"nc"> [?? x 16]
#> # Database: GDAL <SQLITE> WKB [/perm_storage/home/mdsumner/R/x86_64-pc-linux-g...]
#> FID AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74
#> <int64> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl>
#> 1 1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091 1
#> 2 2 0.061 1.231 1827 1827 Alle… 37005 37005 3 487 0
#> 3 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5
#> 4 4 0.07 2.968 1831 1831 Curr… 37053 37053 27 508 1
#> 5 5 0.153 2.206 1832 1832 Nort… 37131 37131 66 1421 9
#> 6 6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7
#> 7 7 0.062 1.547 1834 1834 Camd… 37029 37029 15 286 0
#> 8 8 0.091 1.284 1835 1835 Gates 37073 37073 37 420 0
#> 9 9 0.118 1.421 1836 1836 Warr… 37185 37185 93 968 4
#> 10 10 0.124 1.428 1837 1837 Stok… 37169 37169 85 1612 1
#> # ℹ more rows
#> # ℹ 5 more variables: NWBIR74 <dbl>, BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> # geom <wk_wkb>Standard dplyr verbs generate SQL that GDAL executes:
lf |>
filter(AREA < 0.1) |>
select(NAME, AREA, geom) |>
arrange(AREA)
#> Warning in new_CppObject_xp(fields$.module, fields$.pointer, ...): SpatiaLite
#> is not available
#> # Source: SQL [?? x 3]
#> # Database: GDAL <SQLITE> WKB [/perm_storage/home/mdsumner/R/x86_64-pc-linux-g...]
#> # Ordered by: AREA
#> FID NAME AREA geom
#> <int64> <chr> <dbl> <wk_wkb>
#> 1 0 New Hanover 0.042 <MULTIPOLYGON (((-77.96073 34.18924, -77.96587 34.…
#> 2 1 Chowan 0.044 <MULTIPOLYGON (((-76.68874 36.29452, -76.64822 36.…
#> 3 2 Clay 0.051 <MULTIPOLYGON (((-83.938 34.98939, -83.98855 34.98…
#> 4 3 Pasquotank 0.053 <MULTIPOLYGON (((-76.29893 36.21423, -76.32423 36.…
#> 5 4 Mitchell 0.059 <MULTIPOLYGON (((-82.11885 35.81853, -82.14665 35.…
#> 6 5 Polk 0.06 <MULTIPOLYGON (((-82.21017 35.19313, -82.27833 35.…
#> 7 6 Alleghany 0.061 <MULTIPOLYGON (((-81.23989 36.36536, -81.24069 36.…
#> 8 7 Camden 0.062 <MULTIPOLYGON (((-76.00897 36.3196, -75.95718 36.1…
#> 9 8 Perquimans 0.063 <MULTIPOLYGON (((-76.48053 36.07979, -76.53696 36.…
#> 10 9 Avery 0.064 <MULTIPOLYGON (((-81.94135 35.95498, -81.9614 35.9…
#> # ℹ more rowsUse show_query() to see the SQL:
lf |>
filter(AREA < 0.1) |>
select(NAME, AREA, geom) |>
arrange(AREA) |>
show_query()
#> <SQL>
#> SELECT "NAME", "AREA", "geom"
#> FROM "nc"
#> WHERE ("AREA" < 0.1)
#> ORDER BY "AREA"Use collect() to pull data into memory:
d <- lf |>
filter(AREA < 0.1) |>
select(NAME, AREA, geom) |>
collect()
#> Warning in new_CppObject_xp(fields$.module, fields$.pointer, ...): SpatiaLite
#> is not available
d
#> # A tibble: 34 × 4
#> FID NAME AREA geom
#> <int64> <chr> <dbl> <wk_wkb>
#> 1 0 Alleghany 0.061 <MULTIPOLYGON (((-81.23989 36.36536, -81.24069 36.3…
#> 2 1 Currituck 0.07 <MULTIPOLYGON (((-76.00897 36.3196, -76.01735 36.33…
#> 3 2 Hertford 0.097 <MULTIPOLYGON (((-76.74506 36.23392, -76.98069 36.2…
#> 4 3 Camden 0.062 <MULTIPOLYGON (((-76.00897 36.3196, -75.95718 36.19…
#> 5 4 Gates 0.091 <MULTIPOLYGON (((-76.56251 36.34057, -76.60424 36.3…
#> 6 5 Vance 0.072 <MULTIPOLYGON (((-78.49252 36.17359, -78.51472 36.1…
#> 7 6 Pasquotank 0.053 <MULTIPOLYGON (((-76.29893 36.21423, -76.32423 36.2…
#> 8 7 Watauga 0.081 <MULTIPOLYGON (((-81.80622 36.10456, -81.81715 36.1…
#> 9 8 Perquimans 0.063 <MULTIPOLYGON (((-76.48053 36.07979, -76.53696 36.0…
#> 10 9 Chowan 0.044 <MULTIPOLYGON (((-76.68874 36.29452, -76.64822 36.3…
#> # ℹ 24 more rowslazysf defaults to the SQLITE dialect, which gives you access to spatial SQL functions. These are translated from R-style names to their SQL equivalents:
lf |>
mutate(area_m2 = st_area(geom)) |>
filter(st_area(geom) > 0.1) |>
mutate(srid = st_srid(geom)) |>
collect()Functions like st_area(), st_srid(), and st_geometrytype() work
via GDAL’s built-in SQLite engine (no SpatiaLite extension needed).
Functions that operate on geometry values (like st_astext(),
st_intersects(), st_buffer()) require a SpatiaLite-enabled GDAL
build.
For larger datasets, enable GDAL’s columnar Arrow C stream interface for faster data transfer:
lf <- lazysf(f, use_arrow = TRUE)
lf |> collect()This uses GDALVector$getArrowStream() via nanoarrow — data moves from
GDAL to R in columnar batches instead of row-by-row. Requires GDAL >=
3.6.
lazysf supports two SQL dialects:
SQLITE (default): Full SQLite syntax including subqueries, spatial
functions, CAST, GROUP BY, ORDER BY. Required for dbplyr to work
properly. This is GDAL’s embedded SQLite engine, available for any
format.
lazysf(f) |>
group_by(SID74) |>
summarise(n = n(), mean_area = mean(AREA, na.rm = TRUE)) |>
collect()
#> Warning in new_CppObject_xp(fields$.module, fields$.pointer, ...): SpatiaLite
#> is not available
#> # A tibble: 23 × 4
#> FID SID74 n mean_area
#> <int64> <dbl> <int> <dbl>
#> 1 0 0 13 0.084846
#> 2 1 1 11 0.083091
#> 3 2 2 8 0.13088
#> 4 3 3 6 0.10783
#> 5 4 4 13 0.14715
#> 6 5 5 11 0.12545
#> 7 6 6 4 0.16575
#> 8 7 7 4 0.14975
#> 9 8 8 5 0.134
#> 10 9 9 2 0.1605
#> # ℹ 13 more rowsOGRSQL: GDAL’s native OGR SQL. Simpler, no subquery support, but works for basic operations:
lazysf(f, dialect = "OGRSQL") |>
filter(NAME %LIKE% "A%") |>
collect()For more control, use the DBI interface directly:
con <- dbConnect(GDALSQL(), f)
con
#> <GDALVectorConnection>
#> DSN: /perm_storage/home/mdsumner/R/x86_64-pc-linux-gnu-library/4.5/lazysf/extdata/nc.gpkg
#> dialect: SQLITE
#> geometry: WKB
#> arrow: off
#> tables: nc
dbListTables(con)
#> [1] "nc"
dbListFields(con, "nc")
#> [1] "AREA" "PERIMETER" "CNTY_" "CNTY_ID" "NAME" "FIPS"
#> [7] "FIPSNO" "CRESS_ID" "BIR74" "SID74" "NWBIR74" "BIR79"
#> [13] "SID79" "NWBIR79" "geom"
DBI::dbDisconnect(con)
#> <GDALVectorConnection>
#> DSN:
#> dialect: SQLITE
#> geometry: WKB
#> arrow: off
#> tables: (unavailable)lazysf supports four geometry output formats, set via geom_format:
"WKB"(default): Well-Known Binary, aswk::wkbvectors"WKT": Well-Known Text, aswk::wktvectors"BBOX": Bounding box per feature, aswk::rctvectors"NONE": No geometry (attributes only, faster for non-spatial queries)
lazysf(f, geom_format = "WKT") |>
select(NAME, geom) |>
head(3) |>
collect()
#> Warning in new_CppObject_xp(fields$.module, fields$.pointer, ...): SpatiaLite
#> is not available
#> # A tibble: 3 × 3
#> FID NAME geom
#> <int64> <chr> <wk_wkt>
#> 1 0 Ashe MULTIPOLYGON (((-81.47276 36.23436, -81.54084 36.27251, -81…
#> 2 1 Alleghany MULTIPOLYGON (((-81.23989 36.36536, -81.24069 36.37942, -81…
#> 3 2 Surry MULTIPOLYGON (((-80.45634 36.24256, -80.47639 36.25473, -80…Any GDAL-readable source works — local files, URLs, cloud storage, databases:
## GeoJSON from URL
lazysf("https://example.com/data.geojson")
## PostgreSQL/PostGIS
lazysf("PG:host=localhost dbname=mydb user=me password=secret")
## Cloud-optimized sources via /vsicurl/
lazysf("/vsicurl/https://example.com/large.gpkg")# From CRAN
install.packages("lazysf")
# Development version from r-universe
options(repos = c(
hypertidy = "https://hypertidy.r-universe.dev",
CRAN = "https://cloud.r-project.org"))
install.packages("lazysf")- No window functions:
slice_min(),slice_max(),row_number()etc. are not supported by GDAL’s SQLite engine. - OGRSQL dialect: Does not support subqueries, so complex dplyr verb chains may fail. Use SQLITE dialect (the default) instead.
- FID is sticky: GDAL always includes the feature ID column in results, even when your SQL doesn’t select it.
- Format-dependent: Performance and capability varies by GDAL driver. File formats like Shapefile and GeoPackage work well; text formats like CSV or GeoJSON are slower for large data.
Please note that the lazysf project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.