Skip to content

Conversation

@LuukBlom
Copy link
Collaborator

@LuukBlom LuukBlom commented Dec 23, 2025

Issue addressed

Fixes #521

Explanation

After discussing with @dalmijn, came up with this solution where we provide a quicker alternative to raster.zonal_stats.
Essentially, the function captures this behaviour:

  • find sample points: one per geometry. (determined by sampling_strategy) & determine index (row, col) for each sample point
  • directly sample the xr.Dataset. (this keeps xarray from doing massive amounts of overhead when the dataset is large)
  • spatial datasets with no time dim, skip any stat calculations and return the value immediately. (makes no sense to take the mean/min/max of a single value)
  • spatial datasets with time dim: sample each point at each time step, reduce by applying the stats

When I was testing this, I wrote a little notebook to help visualize bugs:
showcase-sample.ipynb

TODO:

  • coverage
  • changelog
  • performance test compared to zonal_stats?

General Checklist

  • Updated tests or added new tests
  • Branch is up to date with main
  • Tests & pre-commit hooks pass
  • Updated documentation
  • Updated changelog.rst

Data/Catalog checklist

  • data/catalogs/predefined_catalogs.yml has not been modified.
  • None of the old data_catalog.yml files have been changed
  • data/changelog.rst has been updated
  • new file uses LF line endings (done automatically if you used update_versions.py)
  • New file has been tested locally
  • Tests have been added using the new file in the test suite

Additional Notes (optional)

Add any additional notes or information that may be helpful.

@LuukBlom LuukBlom requested a review from dalmijn December 23, 2025 15:40
Copy link
Contributor

@hboisgon hboisgon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Wonder why you did not use raster.sample in the background? Good it the method is very simialr in terms of arguments and return object to raster.zonal_stats.

update tests
renamed sample_stats to sample_points
@LuukBlom
Copy link
Collaborator Author

LuukBlom commented Jan 6, 2026

Thanks for the review @hboisgon ! it made me reconsider my assumptions.

I have refactored the function and implemented your comments:

  • Now uses raster.sample
  • I also renamed it the function from sample_stats to sample_points (can still be changed, but already better imo)
  • no longer accesses the raster point by point, but selects all indices at once

@LuukBlom LuukBlom changed the title feat: Add function sample_stats to the raster accessor as a performant alternative to zonal_stats. feat: Add function sample_points to the raster accessor as a performant alternative to zonal_stats. Jan 6, 2026
@LuukBlom LuukBlom requested a review from hboisgon January 6, 2026 15:25
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 6, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
78.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: rasterize (large) vector data efficiently

3 participants