Add w parameter to progressive_val_score and iter_progressive_val_score by satishkc7 · Pull Request #1762 · online-ml/river

satishkc7 · 2026-03-23T06:21:58Z

Summary

Adds per-sample weight support to progressive_val_score and iter_progressive_val_score by allowing the dataset to yield (x, y, w) triples instead of the usual (x, y) pairs.

Previously, learn_one was always called with the default weight (w=1.0). This change lets users supply a per-sample w directly from the dataset iterator, which is more practical for real-world use cases like time-decayed weighting or cost-sensitive learning.

Usage

# Dataset yields (x, y, w) triples — w is per-sample
weighted_dataset = (
    (x, y, weight)
    for (x, y), weight in zip(datasets.Phishing(), sample_weights)
)

evaluate.progressive_val_score(
    model=model,
    dataset=weighted_dataset,
    metric=metrics.ROCAUC(),
    print_every=200,
)

Samples that don't include a weight (plain (x, y) pairs) default to w=1.0, so existing code is fully backward compatible.

Changes

_progressive_validation wraps the dataset with _iter_dataset() which detects 3-tuples and stores per-sample weights keyed by sample index
The stored weight is forwarded to learn_one when the ground truth is revealed, for models that accept a w parameter
Removed the static w: float = 1.0 parameter from all three public/private functions
Added Notes section to docstrings explaining the (x, y, w) API
Added tests/test_progressive_validation_weights.py with 4 tests covering plain pairs, weighted triples, mixed input, and models without a w param

Backward compatibility

Fully backward compatible — datasets yielding plain (x, y) pairs continue to work unchanged.

Closes #1502

MaxHalford · 2026-03-24T13:59:57Z

Hey there!

I'm not sure I follow. When would it be useful to always use the same weight for each sample? Wouldn't it be more practical if the dataset iterator could include a w variable?

satishkc7 · 2026-03-24T15:39:38Z

That's a great point; a static weight for all samples isn't very useful. Would the right approach be to have the dataset yield (x, y, w) tuples where w is a per-sample float? Or did you have a different pattern in mind? Happy to update the PR accordingly.

MaxHalford · 2026-03-25T19:20:47Z

Yes that's what I have in mind. Though I'm not exactly what the API would look like. But you can take a stab at it! May I ask why you opened this PR in the first place? Do you have a usecase?

satishkc7 · 2026-03-25T20:17:49Z

The use case I had in mind is datasets where samples have unequal importance, e.g. time-decayed weighting or class imbalance where minority samples should count more in the metric. I'll prototype the (x, y, w) approach where the dataset optionally yields a third element and progressive_val_score detects and passes it through to the metric's update call. I'll update the PR once I have something working!

MaxHalford · 2026-03-26T13:10:48Z

Thanks for the details. I was curious whether you had a real-life usecase available too :)

satishkc7 · 2026-03-27T16:07:37Z

Done! Here's the approach I went with:

API: dataset can now yield either (x, y) pairs or (x, y, w) triples. The weight w is a per-sample float that is forwarded to learn_one for models that accept a w parameter. Samples that don't supply a weight default to w=1.0.

No new parameters were added to progressive_val_score or iter_progressive_val_score — the weight is picked up transparently from the dataset iterator.

Example:

from river import datasets, evaluate, linear_model, metrics, preprocessing

model = preprocessing.StandardScaler() | linear_model.LogisticRegression()

# Wrap a dataset to yield (x, y, w) triples with time-decayed weights
def decayed_phishing():
    dataset = list(datasets.Phishing())
    n = len(dataset)
    for i, (x, y) in enumerate(dataset):
        w = (i + 1) / n  # later samples get higher weight
        yield x, y, w

evaluate.progressive_val_score(
    model=model,
    dataset=decayed_phishing(),
    metric=metrics.ROCAUC(),
    print_every=200,
)

Implementation details:

_iter_dataset() unwraps triples, stores w in a sample_weights dict keyed by sample index
At learn time, sample_weights.pop(i, 1.0) retrieves the weight before calling learn_one
Models whose learn_one signature has no w parameter receive the call without it

Added 4 unit tests in tests/test_progressive_validation_weights.py covering: plain pairs defaulting to 1.0, triples with per-sample weights, mixed datasets, and models that don't accept w.

JiwaniZakir

The sample_weights dict in _progressive_validation is populated for every sample in _iter_dataset() but only cleared via pop on line ~110 when use_label is True. In delayed or look-ahead scenarios where many samples are queued before answers arrive, this dict can grow to hold the entire dataset in memory with no upper bound — worth either documenting the memory implication or using a collections.deque/bounded structure.

There's also a subtle collision risk on line ~109: if kwargs (forwarded from simulate_qa) happens to contain a "w" key — for instance, if someone passes extra stream metadata — then model.learn_one(x, y, w=w, **kwargs) will raise TypeError: got multiple values for keyword argument 'w'. The _model_accepts_w guard doesn't protect against this; you'd want to either explicitly remove w from kwargs before the call or document that w is a reserved key in the extra-kwargs convention.

The test's _fake_simulate_qa in test_progressive_validation_weights.py assumes simulate_qa enumerates the wrapped iterator starting at 0 sequentially, which is how _iter_dataset's own enumerate assigns keys to sample_weights. If stream.simulate_qa ever resets its index or introduces gaps (e.g., skipping flagged samples), the index alignment silently breaks and pop falls back to 1.0 with no error — a test exercising a non-trivial delay scenario would make this contract explicit.

satishkc7 · 2026-03-28T20:04:15Z

Thanks for the thorough review! I've addressed all three points:

Memory leak - Replaced the sample_weights dict with a collections.deque. The weight is now popped on the question event and stored inside preds[i] alongside the prediction, so memory is bounded by the delay window rather than the full dataset.
kwargs collision - Added a learn_kwargs = {k: v for k, v in kwargs.items() if k != 'w'} guard before every learn_one call. The metadata w is silently stripped; the sample weight takes precedence.
Fragile index alignment - Since weights now travel with preds[i] rather than a separate dict, index alignment is no longer an issue. Added two new tests: test_weights_correct_under_full_delay (all answers arrive after all questions) and test_kwargs_w_key_does_not_cause_collision.

MaxHalford · 2026-03-28T21:23:34Z

river/evaluate/progressive_validation.py

+    sample_weights: dict[int, float] = {}
+
+    def _iter_dataset():
+        for idx, item in enumerate(dataset):
+            if len(item) == 3:
+                x, y, w = item
+                sample_weights[idx] = w
+            else:
+                x, y = item
+                sample_weights[idx] = 1.0
+            yield x, y


Is this really necessary? Can't you .pop w from *kwargs instead?

satishkc7 · 2026-03-28T21:29:35Z

Updated the PR with the w parameter added directly to progressive_val_score and iter_progressive_val_score (and the internal _progressive_validation).

API:

evaluate.progressive_val_score(
    model=model,
    dataset=dataset,
    metric=metrics.ROCAUC(),
    w=lambda x, y: (x['timestamp'] + 1) / n,  # time-decay example
)

Rules:

w is an optional callable(x, y) -> float
Dataset (x, y, w) triples take precedence over the callable, so mixed datasets behave predictably
Also fixed a variable-shadowing bug: the local sample weight stored in preds[i] was named w, silently overwriting the callable on each answer event. Renamed to sample_w throughout.

Two new tests: test_w_callable_applied_to_xy_pairs and test_tuple_weight_takes_precedence_over_w_callable.

MaxHalford

Getting there! You'll have to fix some conflicts when you rebase.

MaxHalford · 2026-04-01T09:00:07Z

river/evaluate/progressive_validation.py

    measure_time=False,
    measure_memory=False,
    yield_predictions=False,
+    w: typing.Callable[[dict, typing.Any], float] | None = None,


I'd prefer it if you renamed the parameter to weights

MaxHalford · 2026-04-01T09:43:05Z

river/evaluate/progressive_validation.py

+    # avoids any reliance on index alignment between _iter_dataset and simulate_qa.
+    weight_queue: collections.deque[float] = collections.deque()
+
+    def _iter_dataset():


This is not needed if _model_accepts_w is False and w is None, right?

…sive_val_score Allow per-sample weights via two complementary APIs: - weights callable: progressive_val_score(..., weights=lambda x, y: float) - dataset triples: dataset yields (x, y, w) where w is a per-sample float Tuple weights take precedence over the callable so mixed datasets behave predictably. The weight is forwarded to learn_one for models that accept a w parameter (e.g. linear_model.LogisticRegression). Models without a w parameter are called without it to preserve backward compatibility. Implementation: - _needs_weights guard: weight infrastructure (weight_queue, _iter_dataset) is only created when the model accepts w or a weights callable is given, keeping the default path free of any overhead - weight_queue (collections.deque) bounds memory to the delay window, not the full dataset - kwargs w-key collision guard strips w from simulate_qa metadata before learn_one to prevent TypeError when stream metadata includes a w key Closes online-ml#1502

satishkc7 requested review from MaxHalford and smastelini as code owners March 23, 2026 06:21

JiwaniZakir reviewed Mar 28, 2026

View reviewed changes

MaxHalford reviewed Mar 28, 2026

View reviewed changes

MaxHalford reviewed Apr 1, 2026

View reviewed changes

satishkc7 force-pushed the feature/progressive-val-score-w-param branch from 8c9c341 to 47bc3a1 Compare April 4, 2026 00:55

Uh oh!

Conversation

satishkc7 commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Changes

Backward compatibility

Uh oh!

MaxHalford commented Mar 24, 2026

Uh oh!

satishkc7 commented Mar 24, 2026

Uh oh!

MaxHalford commented Mar 25, 2026

Uh oh!

satishkc7 commented Mar 25, 2026

Uh oh!

MaxHalford commented Mar 26, 2026

Uh oh!

satishkc7 commented Mar 27, 2026

Uh oh!

JiwaniZakir left a comment

Choose a reason for hiding this comment

Uh oh!

satishkc7 commented Mar 28, 2026

Uh oh!

MaxHalford Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

satishkc7 commented Mar 28, 2026

Uh oh!

MaxHalford left a comment

Choose a reason for hiding this comment

Uh oh!

MaxHalford Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

MaxHalford Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

satishkc7 commented Mar 23, 2026 •

edited

Loading