Add Brier metrics to motion forecasting evaluation module by wqi · Pull Request #44 · argoverse/av2-api

wqi · 2022-04-14T07:07:12Z

PR Summary

This PR adds Brier score-based variants of ADE and FDE to the motion forecasting evaluation module.

These metrics are implemented in an identical way to their counterparts in the AV1 repo and will be used as scoring metrics in the AV2 MF challenge.

Testing

All functions added in this PR have been unit tested.

In order to ensure this PR works as intended, it is:

unit tested.
other or not applicable (additional detail/rationale required)

Compliance with Standards

As the author, I certify that this PR conforms to the following standards:

Code changes conform to PEP8 and docstrings conform to the Google Python style guide.
A well-written summary explains what was done and why it was done.
The PR is adequately tested and the testing details and links to external results are included.

jagjeet-singh · 2022-04-14T13:57:31Z

src/av2/datasets/motion_forecasting/eval/metrics.py

+
+    Raises:
+        ValueError: If the number of forecasted trajectories and probabilities don't match.
+        ValueError: If normalize=False and `forecast_probabilities` contains values outside of the range [0, 1].


Since these are "probabilities", we should raise the out of range of error regardless of normalize flag.

I was flip-flopping on whether to call this arg weights, likelihoods, or probabilities, but settled on probabilities because I though it was most intuitive.

That being said, I can see use-cases where users might want to directly pass in weights and have them normalized, so it might be nice to perform the range check for sanity afterwards.

Can we call them weights in that case? It isn't evident from the function name or docstring that weights are acceptable.

jagjeet-singh · 2022-04-14T13:59:32Z

src/av2/datasets/motion_forecasting/eval/metrics.py

    return is_missed_prediction
+
+
+def compute_brier_ade(


I see that all the metric functions here work on a single sample. For a batch, we might have to call these functions for individual samples. Wouldn't that be slower because no batch computation will be used?

That's a good point - we can certainly convert all these metric functions into batched equivalents in a follow-up PR. :-)

jagjeet-singh · 2022-04-14T14:01:31Z

src/av2/datasets/motion_forecasting/eval/metrics.py

+    # Validate that all forecast probabilities are in the range [0, 1]
+    if np.logical_or(forecast_probabilities < 0.0, forecast_probabilities > 1.0).any():
+        raise ValueError("At least one forecast probability falls outside the range [0, 1].")
+


The two functions differ by just 1 line. Maybe move most of the stuff to a common function.

jagjeet-singh · 2022-04-14T14:01:38Z

src/av2/datasets/motion_forecasting/eval/metrics.py

+    # Compute FDE with Brier score component
+    fde_vector = compute_fde(forecasted_trajectories, gt_trajectory)
+    brier_score = np.square((1 - forecast_probabilities))
+    print(fde_vector)


remove print?

jagjeet-singh · 2022-04-14T14:02:58Z

tests/datasets/motion_forecasting/eval/test_metrics.py

+uniform_probabilities_k6: NDArrayFloat = np.ones((6,)) / 6
+confident_probabilities_k6: NDArrayFloat = np.array([0.9, 0.02, 0.02, 0.02, 0.02, 0.02])
+non_normalized_probabilities_k6: NDArrayFloat = confident_probabilities_k6 * 100
+wrong_shape_probabilities_k6: NDArrayFloat = np.ones((5,)) / 5


[nit] One unit test for out of range probs

wqi

Thanks for the review @jagjeet-singh! Updated the PR to address feedback.

wqi · 2022-04-29T23:52:50Z

src/av2/datasets/motion_forecasting/eval/metrics.py

+
+    Raises:
+        ValueError: If the number of forecasted trajectories and probabilities don't match.
+        ValueError: If normalize=False and `forecast_probabilities` contains values outside of the range [0, 1].


I was flip-flopping on whether to call this arg weights, likelihoods, or probabilities, but settled on probabilities because I though it was most intuitive.

That being said, I can see use-cases where users might want to directly pass in weights and have them normalized, so it might be nice to perform the range check for sanity afterwards.

wqi · 2022-04-29T23:53:31Z

src/av2/datasets/motion_forecasting/eval/metrics.py

    return is_missed_prediction
+
+
+def compute_brier_ade(


That's a good point - we can certainly convert all these metric functions into batched equivalents in a follow-up PR. :-)

jagjeet-singh suggested changes Apr 14, 2022

View reviewed changes

wqi force-pushed the feature/mf-brier-metrics branch from 491cd70 to e4da62a Compare April 29, 2022 23:59

wqi commented Apr 30, 2022

View reviewed changes

wqi changed the base branch from main to feature/av2-mf-challenge April 30, 2022 00:00

jagjeet-singh approved these changes Apr 30, 2022

View reviewed changes

wqi force-pushed the feature/av2-mf-challenge branch from ce28356 to eb6d202 Compare May 2, 2022 04:39

Base automatically changed from feature/av2-mf-challenge to main May 2, 2022 05:26

Add Brier metrics to MF evaluation module

3384cb1

wqi force-pushed the feature/mf-brier-metrics branch from e4da62a to 3384cb1 Compare May 2, 2022 06:36

wqi merged commit 0baec78 into main May 2, 2022

wqi deleted the feature/mf-brier-metrics branch May 2, 2022 07:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Brier metrics to motion forecasting evaluation module#44

Add Brier metrics to motion forecasting evaluation module#44
wqi merged 1 commit intomainfrom
feature/mf-brier-metrics

wqi commented Apr 14, 2022

Uh oh!

jagjeet-singh Apr 14, 2022

Uh oh!

wqi Apr 29, 2022

Uh oh!

jagjeet-singh Apr 30, 2022

Uh oh!

jagjeet-singh Apr 14, 2022

Uh oh!

wqi Apr 29, 2022

Uh oh!

jagjeet-singh Apr 14, 2022

Uh oh!

jagjeet-singh Apr 14, 2022

Uh oh!

jagjeet-singh Apr 14, 2022

Uh oh!

wqi left a comment

Uh oh!

wqi Apr 29, 2022

Uh oh!

wqi Apr 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wqi commented Apr 14, 2022

PR Summary

Testing

Compliance with Standards

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wqi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants