-
Notifications
You must be signed in to change notification settings - Fork 34
Closed
Description
I discovered what I believe is some kind of bug today while playing around with distance matrices output by sf::st_distance(). It seems as though retaining units massively slows down performance if attempting to convert to a data.frame.
I noticed this issue while attempting to calculate distances between a couple of points and ~40,000 polygons. I've been able to reproduce the issue with some sampled points below.
library(sf)
library(units)
# seed
set.seed(8675309)
# data
nc <- system.file("gpkg/nc.gpkg", package = "sf") |>
sf::st_read()
# outline
outline <- nc |>
sf::st_union() |>
sf::st_as_sf()
# pts
pts <- outline |>
sf::st_sample(40000)
# pts 2
pts_2 <- outline |>
sf::st_sample(2)
# distance matrix
mat <- sf::st_distance(pts_2, pts)The actual calculation of the distance matrix is extremely fast, but in my use case I wanted to convert it to a data.frame.
mat |> as.data.frame()However, this takes much longer than it should.
# normal as.data.frame()
to_df <- function() {
mat |> as.data.frame()
}
# benchmark
rbenchmark::benchmark(to_df(), replications = 5)> rbenchmark::benchmark(to_df(), replications = 5)
test replications elapsed relative user.self sys.self user.child sys.child
1 to_df() 5 132.03 1 72.72 1.7 NA NA
If I, instead, drop the units from the matrix, it runs quickly as expected.
# drop units, then as.data.frame()
to_df_drop_units <- function() {
mat |>
units::drop_units() |>
as.data.frame()
}
# benchmark
rbenchmark::benchmark(to_df_drop_units(), replications = 5)> rbenchmark::benchmark(to_df_drop_units(), replications = 5)
test replications elapsed relative user.self sys.self
1 to_df_drop_units() 5 0.2 1 0.2 0
I've been able to reproduce on both my Windows 11 machine and MacOS laptop.
> sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
other attached packages:
[1] units_0.8-5 sf_1.0-14
loaded via a namespace (and not attached):
[1] s2_1.1.4 utf8_1.2.4 R6_2.5.1 tidyselect_1.2.0
[5] e1071_1.7-13 magrittr_2.0.3 glue_1.6.2 tibble_3.2.1
[9] KernSmooth_2.23-22 pkgconfig_2.0.3 generics_0.1.3 dplyr_1.1.4
[13] wk_0.9.1 lifecycle_1.0.4 classInt_0.4-10 cli_3.6.1
[17] fansi_1.0.5 grid_4.3.2 vctrs_0.6.5 DBI_1.1.3
[21] proxy_0.4-27 class_7.3-22 compiler_4.3.2 tools_4.3.2
[25] pillar_1.9.0 Rcpp_1.0.11 lwgeom_0.2-13 rlang_1.1.2
[29] jsonlite_1.8.8 rbenchmark_1.0.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels