Skip to content

slow as.data.frame() conversion from sf::st_distance() #361

@grcatlin

Description

@grcatlin

I discovered what I believe is some kind of bug today while playing around with distance matrices output by sf::st_distance(). It seems as though retaining units massively slows down performance if attempting to convert to a data.frame.

I noticed this issue while attempting to calculate distances between a couple of points and ~40,000 polygons. I've been able to reproduce the issue with some sampled points below.

library(sf)
library(units)

# seed
set.seed(8675309)

# data
nc <- system.file("gpkg/nc.gpkg", package = "sf") |>
    sf::st_read()

# outline
outline <- nc |>
    sf::st_union() |>
    sf::st_as_sf()

# pts
pts <- outline |>
    sf::st_sample(40000)

# pts 2
pts_2 <- outline |>
    sf::st_sample(2)

# distance matrix
mat <- sf::st_distance(pts_2, pts)

The actual calculation of the distance matrix is extremely fast, but in my use case I wanted to convert it to a data.frame.

mat |> as.data.frame()

However, this takes much longer than it should.

# normal as.data.frame()
to_df <- function() {
    mat |> as.data.frame()
}

# benchmark
rbenchmark::benchmark(to_df(), replications = 5)
> rbenchmark::benchmark(to_df(), replications = 5)
     test replications elapsed relative user.self sys.self user.child sys.child
1 to_df()            5  132.03        1     72.72      1.7         NA        NA

If I, instead, drop the units from the matrix, it runs quickly as expected.

# drop units, then as.data.frame()
to_df_drop_units <- function() {
    mat |>
        units::drop_units() |>
        as.data.frame()
}

# benchmark
rbenchmark::benchmark(to_df_drop_units(), replications = 5)
> rbenchmark::benchmark(to_df_drop_units(), replications = 5)
                test replications elapsed relative user.self sys.self
1 to_df_drop_units()            5     0.2        1       0.2        0

I've been able to reproduce on both my Windows 11 machine and MacOS laptop.

> sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)

other attached packages:
[1] units_0.8-5 sf_1.0-14

loaded via a namespace (and not attached):
 [1] s2_1.1.4           utf8_1.2.4         R6_2.5.1           tidyselect_1.2.0
 [5] e1071_1.7-13       magrittr_2.0.3     glue_1.6.2         tibble_3.2.1
 [9] KernSmooth_2.23-22 pkgconfig_2.0.3    generics_0.1.3     dplyr_1.1.4
[13] wk_0.9.1           lifecycle_1.0.4    classInt_0.4-10    cli_3.6.1
[17] fansi_1.0.5        grid_4.3.2         vctrs_0.6.5        DBI_1.1.3
[21] proxy_0.4-27       class_7.3-22       compiler_4.3.2     tools_4.3.2
[25] pillar_1.9.0       Rcpp_1.0.11        lwgeom_0.2-13      rlang_1.1.2
[29] jsonlite_1.8.8     rbenchmark_1.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions