Added support for prediction intervals for VARMAX regressor by christopherbunn · Pull Request #4267 · alteryx/evalml

christopherbunn · 2023-08-09T18:36:32Z

Resolves #4262

codecov · 2023-08-09T18:44:46Z

Codecov Report

Merging #4267 (2890f04) into main (daa8568) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #4267     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        355     355             
  Lines      38915   38956     +41     
=======================================
+ Hits       38794   38835     +41     
  Misses       121     121

Files Changed	Coverage Δ
...tors/regressors/exponential_smoothing_regressor.py	`100.0% <100.0%> (ø)`
...mponents/estimators/regressors/varmax_regressor.py	`100.0% <100.0%> (ø)`
...lml/tests/component_tests/test_varmax_regressor.py	`100.0% <100.0%> (ø)`

jeremyliweishih

LGTM

eccabay

Looks solid! Just a few questions and some testing suggestions

eccabay · 2023-08-10T17:50:09Z

evalml/pipelines/components/estimators/regressors/varmax_regressor.py

+        # anchor represents where the simulations should start from (forecasting is done from the "end")
+        y_pred = self._component_obj._fitted_forecaster.simulate(
+            nsimulations=X.shape[0],
+            repetitions=400,


Why is this fixed at 400?

This implementation is based on the one we have for exponential smoothing and this is the value that is set there. Do you think we should have it passed in as a parameter?

Hmm, poking around in our exponential smoother and statsmodels' docs on the subject, it's unclear to me why this was set at 400. I think at least setting it as a constant would be good, since the number seems arbitrary.

Sounds good, will update to include _N_REPETITIONS=400

eccabay · 2023-08-10T17:52:01Z

evalml/pipelines/components/estimators/regressors/varmax_regressor.py

@@ -217,9 +217,43 @@ def get_prediction_intervals(
        Returns:
            dict: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.


I think this needs to be updated since the return here will be a nested, per series dictionary - do I have that correct?

Yep, updated the doc string!

eccabay · 2023-08-10T17:52:50Z

evalml/pipelines/components/estimators/regressors/varmax_regressor.py

+            exog=X if self.use_covariates else None,
        )
+        prediction_interval_result = {}
+        for series in self._component_obj._fitted_forecaster.model.endog_names:


What are endog_names, where do those come from? Is that the columns of y in unstacked/dataframe format?

Yes, they are and they are set internally in statsmodels during the fit process. I can add a comment describing this. I don't think there's a better way to access this info other than storing it as class variable during the fit process?

eccabay · 2023-08-10T17:55:23Z

evalml/tests/component_tests/test_varmax_regressor.py

+@pytest.mark.parametrize("use_covariates", [True, False])
+def test_varmax_regressor_prediction_intervals(use_covariates, ts_multiseries_data):
+    X_train, X_test, y_train = ts_multiseries_data(no_features=not use_covariates)


I think an interesting test here would be to check the cases where X is None and use_covariates is True, and where X is not None and use_covariates is False - we have lots of checks for those cases, it'd be nice to ensure we handle those smoothly

To clarify, you mean the cases where X in fit() and not in in get_prediction_intervals() right? I can add that case in!

evalml/tests/component_tests/test_varmax_regressor.py

eccabay · 2023-08-14T12:17:12Z

evalml/pipelines/components/estimators/regressors/varmax_regressor.py

+        # anchor represents where the simulations should start from (forecasting is done from the "end")
+        y_pred = self._component_obj._fitted_forecaster.simulate(
+            nsimulations=X.shape[0],
+            repetitions=400,


Hmm, poking around in our exponential smoother and statsmodels' docs on the subject, it's unclear to me why this was set at 400. I think at least setting it as a constant would be good, since the number seems arbitrary.

christopherbunn marked this pull request as ready for review August 10, 2023 12:40

auto-assign bot assigned christopherbunn Aug 10, 2023

christopherbunn requested review from MichaelFu512, chukarsten, eccabay, fjlanasa, jeremyliweishih and remyogasawara August 10, 2023 12:40

christopherbunn force-pushed the TML-7894_pred_int_varmax branch from a6b738c to c435cd1 Compare August 10, 2023 17:04

jeremyliweishih approved these changes Aug 10, 2023

View reviewed changes

eccabay suggested changes Aug 11, 2023

View reviewed changes

eccabay approved these changes Aug 14, 2023

View reviewed changes

christopherbunn force-pushed the TML-7894_pred_int_varmax branch from dd72b1b to 962d3e7 Compare August 14, 2023 15:00

christopherbunn enabled auto-merge (squash) August 14, 2023 15:01

christopherbunn disabled auto-merge August 14, 2023 15:01

christopherbunn added 5 commits August 14, 2023 12:17

Initial commit

23810ba

Updated release notes

1b6dc25

Fixed random state handling

a8a20bd

Updated tests

327b5e6

Moved number of simulations as constant.

2890f04

christopherbunn force-pushed the TML-7894_pred_int_varmax branch from 962d3e7 to 2890f04 Compare August 14, 2023 16:17

christopherbunn merged commit 3572300 into main Aug 14, 2023

christopherbunn deleted the TML-7894_pred_int_varmax branch August 14, 2023 18:03

		@@ -217,9 +217,43 @@ def get_prediction_intervals(
		Returns:
		dict: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Conversation

christopherbunn commented Aug 9, 2023

Uh oh!

codecov bot commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jeremyliweishih left a comment

Choose a reason for hiding this comment

Uh oh!

eccabay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christopherbunn Aug 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Aug 9, 2023 •

edited

Loading

christopherbunn Aug 14, 2023 •

edited

Loading