Add w parameter to progressive_val_score and iter_progressive_val_score#1762
Add w parameter to progressive_val_score and iter_progressive_val_score#1762satishkc7 wants to merge 1 commit intoonline-ml:mainfrom
Conversation
|
Hey there! I'm not sure I follow. When would it be useful to always use the same weight for each sample? Wouldn't it be more practical if the |
|
That's a great point; a static weight for all samples isn't very useful. Would the right approach be to have the dataset yield (x, y, w) tuples where w is a per-sample float? Or did you have a different pattern in mind? Happy to update the PR accordingly. |
|
Yes that's what I have in mind. Though I'm not exactly what the API would look like. But you can take a stab at it! May I ask why you opened this PR in the first place? Do you have a usecase? |
|
The use case I had in mind is datasets where samples have unequal importance, e.g. time-decayed weighting or class imbalance where minority samples should count more in the metric. I'll prototype the (x, y, w) approach where the dataset optionally yields a third element and progressive_val_score detects and passes it through to the metric's update call. I'll update the PR once I have something working! |
|
Thanks for the details. I was curious whether you had a real-life usecase available too :) |
|
Done! Here's the approach I went with: API: No new parameters were added to Example: from river import datasets, evaluate, linear_model, metrics, preprocessing
model = preprocessing.StandardScaler() | linear_model.LogisticRegression()
# Wrap a dataset to yield (x, y, w) triples with time-decayed weights
def decayed_phishing():
dataset = list(datasets.Phishing())
n = len(dataset)
for i, (x, y) in enumerate(dataset):
w = (i + 1) / n # later samples get higher weight
yield x, y, w
evaluate.progressive_val_score(
model=model,
dataset=decayed_phishing(),
metric=metrics.ROCAUC(),
print_every=200,
)Implementation details:
Added 4 unit tests in |
JiwaniZakir
left a comment
There was a problem hiding this comment.
The sample_weights dict in _progressive_validation is populated for every sample in _iter_dataset() but only cleared via pop on line ~110 when use_label is True. In delayed or look-ahead scenarios where many samples are queued before answers arrive, this dict can grow to hold the entire dataset in memory with no upper bound — worth either documenting the memory implication or using a collections.deque/bounded structure.
There's also a subtle collision risk on line ~109: if kwargs (forwarded from simulate_qa) happens to contain a "w" key — for instance, if someone passes extra stream metadata — then model.learn_one(x, y, w=w, **kwargs) will raise TypeError: got multiple values for keyword argument 'w'. The _model_accepts_w guard doesn't protect against this; you'd want to either explicitly remove w from kwargs before the call or document that w is a reserved key in the extra-kwargs convention.
The test's _fake_simulate_qa in test_progressive_validation_weights.py assumes simulate_qa enumerates the wrapped iterator starting at 0 sequentially, which is how _iter_dataset's own enumerate assigns keys to sample_weights. If stream.simulate_qa ever resets its index or introduces gaps (e.g., skipping flagged samples), the index alignment silently breaks and pop falls back to 1.0 with no error — a test exercising a non-trivial delay scenario would make this contract explicit.
|
Thanks for the thorough review! I've addressed all three points:
|
| sample_weights: dict[int, float] = {} | ||
|
|
||
| def _iter_dataset(): | ||
| for idx, item in enumerate(dataset): | ||
| if len(item) == 3: | ||
| x, y, w = item | ||
| sample_weights[idx] = w | ||
| else: | ||
| x, y = item | ||
| sample_weights[idx] = 1.0 | ||
| yield x, y |
There was a problem hiding this comment.
Is this really necessary? Can't you .pop w from *kwargs instead?
|
Updated the PR with the API: evaluate.progressive_val_score(
model=model,
dataset=dataset,
metric=metrics.ROCAUC(),
w=lambda x, y: (x['timestamp'] + 1) / n, # time-decay example
)Rules:
Two new tests: |
MaxHalford
left a comment
There was a problem hiding this comment.
Getting there! You'll have to fix some conflicts when you rebase.
| measure_time=False, | ||
| measure_memory=False, | ||
| yield_predictions=False, | ||
| w: typing.Callable[[dict, typing.Any], float] | None = None, |
There was a problem hiding this comment.
I'd prefer it if you renamed the parameter to weights
| # avoids any reliance on index alignment between _iter_dataset and simulate_qa. | ||
| weight_queue: collections.deque[float] = collections.deque() | ||
|
|
||
| def _iter_dataset(): |
There was a problem hiding this comment.
This is not needed if _model_accepts_w is False and w is None, right?
…sive_val_score Allow per-sample weights via two complementary APIs: - weights callable: progressive_val_score(..., weights=lambda x, y: float) - dataset triples: dataset yields (x, y, w) where w is a per-sample float Tuple weights take precedence over the callable so mixed datasets behave predictably. The weight is forwarded to learn_one for models that accept a w parameter (e.g. linear_model.LogisticRegression). Models without a w parameter are called without it to preserve backward compatibility. Implementation: - _needs_weights guard: weight infrastructure (weight_queue, _iter_dataset) is only created when the model accepts w or a weights callable is given, keeping the default path free of any overhead - weight_queue (collections.deque) bounds memory to the delay window, not the full dataset - kwargs w-key collision guard strips w from simulate_qa metadata before learn_one to prevent TypeError when stream metadata includes a w key Closes online-ml#1502
8c9c341 to
47bc3a1
Compare
Summary
Adds per-sample weight support to
progressive_val_scoreanditer_progressive_val_scoreby allowing the dataset to yield(x, y, w)triples instead of the usual(x, y)pairs.Previously,
learn_onewas always called with the default weight (w=1.0). This change lets users supply a per-samplewdirectly from the dataset iterator, which is more practical for real-world use cases like time-decayed weighting or cost-sensitive learning.Usage
Samples that don't include a weight (plain
(x, y)pairs) default tow=1.0, so existing code is fully backward compatible.Changes
_progressive_validationwraps the dataset with_iter_dataset()which detects 3-tuples and stores per-sample weights keyed by sample indexlearn_onewhen the ground truth is revealed, for models that accept awparameterw: float = 1.0parameter from all three public/private functionsNotessection to docstrings explaining the(x, y, w)APItests/test_progressive_validation_weights.pywith 4 tests covering plain pairs, weighted triples, mixed input, and models without awparamBackward compatibility
Fully backward compatible — datasets yielding plain
(x, y)pairs continue to work unchanged.Closes #1502