Adding in REV without dask support by thomaspagano · Pull Request #999 · nci/scores

thomaspagano · 2026-02-11T01:52:18Z

As requested, a version of REV without any support for dask whatsoever. Note that when run with dask it doesn't complain, should it?

Please work through the following checklists. Delete anything that isn't relevant.

Development for new xarray-based metrics

Works with n-dimensional data and includes reduce_dims, preserve_dims, and weights args.
Typehints added
Add error handling
Imported into the API
Works with both xr.DataArrays and xr.Datasets if possible

Docstrings

Docstrings complete and follow Napoleon (google) style
Maths equation added
Reference to paper/webpage is in docstring. The preferred referencing style for journal articles is APA (7th edition)
Code example added

Testing of new xarray-based metrics

100% unit test coverage
[NO BY DESIGN] Test that metric is compatible with dask.
Test that metrics work with inputs that contain NaNs
Test that broadcasting with xarray works
Test both reduce and preserve dims arguments work
Test that errors are raised as expected
Test that it works with both xr.DataArrays and xr.Datasets

Tutorial notebook

Short introduction to why you would use that metric and what it tells you
A link to a reference
A "things to try next" section at the end
Add notebook to Tutorial_Gallery.ipynb
Optional - a detailed discussion of how the metric works at the end of the notebook

Documentation

Add the score to the API documentation
Add the score to the included list of metrics and tools

nicholasloveday

Thanks @thomaspagano . This is starting to look pretty good. It looks like it's not using the Jive test data. Once you update the tests to use the Jive test data, I'll take another look at the tests.

src/scores/probability/rev_impl.py

nicholasloveday · 2026-02-13T05:01:58Z

src/scores/probability/rev_impl.py

+    if not np.all(np.isin(unique_obs, [0, 1])):
+        raise ValueError("obs must contain only 0, 1, or NaN values")
+
+    return None


I don't think that you need return None. If you update this, then you can simplify the docstring.

Also. check_binary already exists, so you can just use https://github.com/nci/scores/blob/develop/src/scores/utils.py#L435

Removed all instances of return None, swapped to using check_binary and deleted redundant test.

The typehints on some of these validation functions return a None still.

Additionally, _validate_rev_inputs has a misleading return typhint that should be removed

Changed as suggested

src/scores/probability/rev_impl.py

nicholasloveday · 2026-02-13T05:12:48Z

src/scores/probability/rev_impl.py

+    return result
+
+
+def calculate_climatology(


What are your thoughts on this going in a general file somewhere else? This feels like a useful tool that could be used for other things. Not sure where...perhaps in scores.processing? I'm interested in @tennlee 's view

I'm happy either way. Maybe leave it for now.

nicholasloveday · 2026-02-13T05:20:36Z

src/scores/probability/rev_impl.py

+    if weights is not None:
+        # Identify which dimensions weights actually span
+        weight_reduce_dims = tuple(d for d in obs_reduce_dims if d in weights.dims)
+
+        if weight_reduce_dims:
+            # Use aggregate for weighted mean over dimensions that weights span
+            climatology = aggregate(obs, reduce_dims=weight_reduce_dims, weights=weights, method="mean")
+
+            # Unweighted mean over remaining dimensions
+            remaining_dims = tuple(d for d in obs_reduce_dims if d not in weight_reduce_dims)
+            if remaining_dims:
+                climatology = aggregate(climatology, reduce_dims=remaining_dims, weights=None, method="mean")
+        else:
+            # Weights exist but don't span any reduction dimensions - ignore them
+            climatology = aggregate(obs, reduce_dims=obs_reduce_dims, weights=None, method="mean")
+    else:
+        # Simple unweighted mean
+        climatology = aggregate(obs, reduce_dims=obs_reduce_dims, weights=None, method="mean")


The logic here can be simplified as aggregate already handles a lot of this. (e.g., if reduce_dims=None, aggregate already returns the unweighted mean. Take a look at https://github.com/nci/scores/blob/develop/src/scores/processing/aggregation.py#L15

Consider if you want to simplify the tests as a result of this or just leave them in to test each case.

Simplified as suggested

This can be simplified further:

dims_to_reduce = gather_dimensions( fcst_dims=(), # Not needed for climatology obs_dims=obs.dims, reduce_dims=reduce_dims, preserve_dims=preserve_dims, weights_dims=weights.dims if weights is not None else None, ) climatology = aggregate( obs, reduce_dims=dims_to_reduce, weights=weights, method="mean", ) return climatology

Accepted as suggested

tutorials/REV_Score.ipynb

nicholasloveday · 2026-02-18T02:03:57Z

tutorials/REV_Score.ipynb

+    "n_times = 100\n",
+    "n_members = 50\n",
+    "# Generate ensemble with some skill\n",
+    "seed_temps = np.random.normal(1, 3, n_times) \n",


I'm confused why this is called "seed_temps"

Comment added for clarity

tests/probability/test_rev.py

src/scores/probability/rev_impl.py

nicholasloveday · 2026-02-18T02:20:04Z

tests/probability/test_rev.py

+# Module-level test data ... Preserved as is because these come from Jive and we want to demonstrate
+# that they've been carried over exactly.


This looks different to the test data in Jive... Can you please review and update?

Hopefully it's all good now?

nicholasloveday · 2026-02-18T03:09:47Z

@thomaspagano I also see that there are now a couple of conflicts that need resolving with the latest develop branch.

nicholasloveday · 2026-02-24T02:24:01Z

tests/probability/test_rev.py

+            # 5: get rational_user
+            # Censored because scores code does not allow
+            # cost loss ratios to be optional, they are required
+            # (
+            #     rtd.FCST_2X3X2_WITH_NAN_MISALIGNED,
+            #     rtd.OBS_3X3_WITH_NAN_MISALIGNED,
+            #     [0, 0.3, 0.5, 0.7, 1],
+            #     None,
+            #     ["lead_day"],
+            #     ["maximum", "rational_user"],
+            #     None,
+            #     rtd.EXP_PREV_CASE5,
+            # ),
+            # # 6: don't supply derived_metrics, supply threshold_outputs
+            # (
+            #     rtd.FCST_2X3X2_WITH_NAN_MISALIGNED,
+            #     rtd.OBS_3X3_WITH_NAN_MISALIGNED,
+            #     [0, 0.3, 0.5, 0.7, 1],
+            #     None,
+            #     ["lead_day"],
+            #     ["rational_user"],
+            #     [0, 0.3, 0.5, 0.7, 1],
+            #     rtd.EXP_PREV_CASE6,
+            # ),
+            # issue #162  - dims accepts any iterable


Feel free to delete these commented out tests

Deleted as suggested

nicholasloveday · 2026-02-24T02:50:05Z

src/scores/probability/rev_impl.py

+            raise ValueError(
+                f"Invalid derived_metrics value: '{mode}'. Valid options are 'maximum' and 'rational_user'"


This should already be checked in _validate_derived_metrics and isn't needed here

Changed as suggested

nicholasloveday · 2026-02-24T02:52:52Z

src/scores/probability/rev_impl.py

+    # Handle cost-loss ratios
+    if isinstance(cost_loss_ratios, (float, int)):
+        cost_loss_ratios = [cost_loss_ratios]
+    cost_loss_xr = xr.DataArray(cost_loss_ratios, dims=[cost_loss_dim], coords={cost_loss_dim: cost_loss_ratios})


This conversion to an xarray object isn't needed.

Changed as suggested.

nicholasloveday · 2026-02-24T02:57:57Z

src/scores/probability/rev_impl.py

+    # False alarm rate (probability of false detection) equation 5
+
+    try:
+        check_monotonic_array(cost_loss_ratios)


Why is there a separate check for cost_loss_ratios here to _validate_cost_loss_ratios. Can't you just have one function that checks both?

Changed as suggested

nicholasloveday · 2026-02-24T05:57:18Z

src/scores/probability/rev_impl.py

+def _validate_dimensions(
+    fcst: XarrayLike,
+    obs: XarrayLike,
+    weights: Optional[xr.DataArray],
+    threshold_dim: str,
+    cost_loss_dim: str,
+) -> None:
+    """Check for dimension conflicts in inputs."""
+    # Dimension checks are always lazy - just check dims, not values
+    # Check fcst dimensions
+    if threshold_dim in fcst.dims:
+        raise ValueError(f"'{threshold_dim}' cannot be a dimension in fcst")
+    if cost_loss_dim in fcst.dims:
+        raise ValueError(f"'{cost_loss_dim}' cannot be a dimension in fcst")
+
+    # Check obs dimensions
+    if threshold_dim in obs.dims:
+        raise ValueError(f"'{threshold_dim}' cannot be a dimension in obs")
+    if cost_loss_dim in obs.dims:
+        raise ValueError(f"'{cost_loss_dim}' cannot be a dimension in obs")
+
+    # Check weights dimensions
+    if weights is not None:
+        if threshold_dim in weights.dims:
+            raise ValueError(f"'{threshold_dim}' cannot be a dimension in weights")
+        if cost_loss_dim in weights.dims:
+            raise ValueError(f"'{cost_loss_dim}' cannot be a dimension in weights")


You can simplify this like:

inputs = [("fcst", fcst), ("obs", obs)] if weights is not None: inputs.append(("weights", weights)) for dim_name in (threshold_dim, cost_loss_dim): for input_name, input_data in inputs: if dim_name in input_data.dims: raise ValueError(f"'{dim_name}' cannot be a dimension in {input_name}")

Changed as suggested

nicholasloveday · 2026-02-24T05:57:34Z

src/scores/probability/rev_impl.py

+    cost_loss_dim: str,
+) -> None:
+    """Check for dimension conflicts in inputs."""
+    # Dimension checks are always lazy - just check dims, not values


Remove comment about "lazy"

No longer relevant

nicholasloveday · 2026-02-24T06:00:15Z

src/scores/probability/rev_impl.py

+    Raises ValueError if invalid values are present.
+    """
+
+    check_binary(obs, "obs")


Any reason why we need a wrapper around check_binary? Why not just call check_binary directly?

Changed as suggested

nicholasloveday · 2026-02-24T06:07:24Z

src/scores/probability/rev_impl.py

+        binary_fcst = binary_discretise(fcst, threshold, ">=")
+
+        # Rename the threshold dimension if needed
+        if "threshold" in binary_fcst.dims and threshold_dim != "threshold":


Can this be simplified to

Suggested change

if "threshold" in binary_fcst.dims and threshold_dim != "threshold":

if threshold_dim != "threshold":

Accepted as suggested

nicholasloveday · 2026-02-24T22:42:40Z

tests/probability/test_rev.py

+        fcst = xr.DataArray([0.2, 0.8, 0.6], dims=["time"])
+        obs = xr.DataArray([0, 1, 1], dims=["time"])
+
+        try:


Instead of using try, except in tests; can you use pytest.raises instead?

changed as suggested

nicholasloveday · 2026-02-24T22:43:02Z

tests/probability/test_rev.py

+        cost_loss_ratios = [0.5]
+        derived_metrics = ["pizza_oven"]
+
+        try:


Use pytest.raises instead

changed as suggested

nicholasloveday · 2026-02-24T22:48:54Z

tests/probability/test_rev.py

+        # The 'rational_user' DataArray should not contain the threshold_dim anymore
+        assert "threshold" not in result["rational_user"].dims
+
+    def test_rational_user_no_threshold_in_coords(self):


Is this test needed? Is there a way that threshold coord is not present?

I might leave it in. It's testing that threshold coord is not present which is a good thing. It's showing that the code is doing work.

nicholasloveday · 2026-02-24T22:52:02Z

tests/probability/test_rev.py

+    def test_custom_dimension_names(self):
+        """Test that custom dimension names work correctly."""
+        fcst = xr.DataArray([0.2, 0.8, 0.6, 0.4], dims=["time"])
+        obs = xr.DataArray([0, 1, 1, 0], dims=["time"])
+
+        # Test custom threshold_dim only
+        result = relative_economic_value(
+            fcst,
+            obs,
+            cost_loss_ratios=[0.3, 0.7],
+            threshold=[0.5],
+            threshold_dim="my_threshold",
+        )
+
+        assert "my_threshold" in result.dims
+        assert "cost_loss_ratio" in result.dims
+        assert result.dims == ("my_threshold", "cost_loss_ratio")
+
+    def test_custom_cost_loss_dim(self):
+        """Test that custom cost_loss_dim works correctly."""
+        fcst = xr.DataArray([0.2, 0.8, 0.6, 0.4], dims=["time"])
+        obs = xr.DataArray([0, 1, 1, 0], dims=["time"])
+
+        # Test custom cost_loss_dim only
+        result = relative_economic_value(
+            fcst,
+            obs,
+            cost_loss_ratios=[0.3, 0.7],
+            threshold=[0.5],
+            cost_loss_dim="alpha",
+        )
+
+        assert "threshold" in result.dims
+        assert "alpha" in result.dims
+        assert result.dims == ("threshold", "alpha")
+
+    def test_both_custom_dimension_names(self):
+        """Test that both custom dimension names work together."""
+        fcst = xr.DataArray([0.2, 0.8, 0.6, 0.4], dims=["time"])
+        obs = xr.DataArray([0, 1, 1, 0], dims=["time"])
+
+        # Test both custom dims
+        result = relative_economic_value(
+            fcst,
+            obs,
+            cost_loss_ratios=[0.3, 0.7],
+            threshold=[0.5],
+            threshold_dim="decision_threshold",
+            cost_loss_dim="alpha",
+        )
+
+        assert "decision_threshold" in result.dims
+        assert "alpha" in result.dims
+        assert result.dims == ("decision_threshold", "alpha")
+        assert "threshold" not in result.dims
+        assert "cost_loss_ratio" not in result.dims
+


Can you consolidate these tests above using @pytest.mark.parametrize

Additionally, you can simplify the assert statements to just be assert result.dims == expected_dims
For example in test_custom_dimension_names you have:

assert "my_threshold" in result.dims assert "cost_loss_ratio" in result.dims assert result.dims == ("my_threshold", "cost_loss_ratio")

The first two asserts aren't needed as they're covered by the third assert

Changed as suggested

nicholasloveday · 2026-02-24T22:53:36Z

tests/probability/test_rev.py

+class TestWeights:
+    """Tests for handling of time/spatial weights in REV calculations."""
+
+    @pytest.mark.parametrize("weight_value", [0.5, 1, 2])


Not sure that this needs @pytest.mark.parametrize. Uniform weights all lead to the same result, so we probably don't need to test it 3 times.

Accepted as suggested

nicholasloveday · 2026-02-25T03:36:47Z

tests/probability/test_rev.py

+
+        xr.testing.assert_allclose(rev_weighted, rev_unweighted)
+
+    def test_weights_nonuniform(self, make_contingency_data):


Non uniform weights are already tested in test_spatial_weights_broadcast, so this probably isn't needed.

Accepted as suggested

nicholasloveday · 2026-02-25T03:38:36Z

tests/probability/test_rev.py

+        )
+        xr.testing.assert_allclose(result, expected, atol=0.001)
+
+    def test_weights_negative(self, make_contingency_data):


Should this just be included in the main input validation test?

moved as suggested

nicholasloveday · 2026-02-25T03:56:48Z

tests/probability/test_rev.py

+
+        assert "dimension 'cost_loss_ratio' must not be in input data" in str(excinfo.value)
+
+    def test_check_monotonic_array_non_convertible_raises_typeerror(self):


Combine with test_out_of_range_arrays using pytest.mark.paremeterize

accepted as suggested

nicholasloveday · 2026-02-25T03:57:06Z

tests/probability/test_rev.py

+        with pytest.raises(TypeError, match="could not convert array into a numpy ndarray of floats"):
+            check_monotonic_array(["not", "numbers"])
+
+    def test_check_monotonic_array_rejects_multidimensional_arrays(self):


Combine with test_out_of_range_arrays using pytest.mark.paremeterize

accepted as suggested

nicholasloveday · 2026-02-25T03:58:14Z

tests/probability/test_rev.py

+                check_args=True,
+            )
+
+    def test_fcst_outside_01_with_threshold(self):


Incorporate into test_input_validation

accepted as suggested

nicholasloveday · 2026-02-25T04:15:19Z

Thanks @thomaspagano , I have made several more suggestions with a particular focus on ideas on how to simply/clean up the code (including tests) further to make it easier to maintain

thomaspagano marked this pull request as ready for review February 11, 2026 01:59

nicholasloveday requested changes Feb 18, 2026

View reviewed changes

Adding in REV without dask support

bb1127c

thomaspagano force-pushed the issue-632-REV-isolated-rev-without-dask branch from 5208ad5 to f8ff2bd Compare February 23, 2026 03:21

Tom Pagano (he/him) added 2 commits February 23, 2026 03:24

Responding to Nick's review minus legacy jive tests

7bb9511

Adding in legacy Jive tests

ec7b905

thomaspagano force-pushed the issue-632-REV-isolated-rev-without-dask branch from f8ff2bd to ec7b905 Compare February 23, 2026 03:25

Fixing notebook typo

ac5b343

nicholasloveday reviewed Feb 24, 2026

View reviewed changes

nicholasloveday requested changes Feb 24, 2026

View reviewed changes

nicholasloveday requested changes Feb 25, 2026

View reviewed changes

responding to nick's reviews

61a75ba

		# Module-level test data ... Preserved as is because these come from Jive and we want to demonstrate
		# that they've been carried over exactly.

		raise ValueError(
		f"Invalid derived_metrics value: '{mode}'. Valid options are 'maximum' and 'rational_user'"

	if "threshold" in binary_fcst.dims and threshold_dim != "threshold":
	if threshold_dim != "threshold":


		xr.testing.assert_allclose(rev_weighted, rev_unweighted)

		def test_weights_nonuniform(self, make_contingency_data):


		assert "dimension 'cost_loss_ratio' must not be in input data" in str(excinfo.value)

		def test_check_monotonic_array_non_convertible_raises_typeerror(self):

Conversation

thomaspagano commented Feb 11, 2026

Development for new xarray-based metrics

Docstrings

Testing of new xarray-based metrics

Tutorial notebook

Documentation

Uh oh!

nicholasloveday left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicholasloveday commented Feb 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!