Added support for prediction intervals for VARMAX regressor#4267
Added support for prediction intervals for VARMAX regressor#4267christopherbunn merged 5 commits intomainfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4267 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 355 355
Lines 38915 38956 +41
=======================================
+ Hits 38794 38835 +41
Misses 121 121
|
a6b738c to
c435cd1
Compare
eccabay
left a comment
There was a problem hiding this comment.
Looks solid! Just a few questions and some testing suggestions
| # anchor represents where the simulations should start from (forecasting is done from the "end") | ||
| y_pred = self._component_obj._fitted_forecaster.simulate( | ||
| nsimulations=X.shape[0], | ||
| repetitions=400, |
There was a problem hiding this comment.
This implementation is based on the one we have for exponential smoothing and this is the value that is set there. Do you think we should have it passed in as a parameter?
There was a problem hiding this comment.
Hmm, poking around in our exponential smoother and statsmodels' docs on the subject, it's unclear to me why this was set at 400. I think at least setting it as a constant would be good, since the number seems arbitrary.
There was a problem hiding this comment.
Sounds good, will update to include _N_REPETITIONS=400
| @@ -217,9 +217,43 @@ def get_prediction_intervals( | |||
| Returns: | |||
| dict: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. | |||
There was a problem hiding this comment.
I think this needs to be updated since the return here will be a nested, per series dictionary - do I have that correct?
There was a problem hiding this comment.
Yep, updated the doc string!
| exog=X if self.use_covariates else None, | ||
| ) | ||
| prediction_interval_result = {} | ||
| for series in self._component_obj._fitted_forecaster.model.endog_names: |
There was a problem hiding this comment.
What are endog_names, where do those come from? Is that the columns of y in unstacked/dataframe format?
There was a problem hiding this comment.
Yes, they are and they are set internally in statsmodels during the fit process. I can add a comment describing this. I don't think there's a better way to access this info other than storing it as class variable during the fit process?
| @pytest.mark.parametrize("use_covariates", [True, False]) | ||
| def test_varmax_regressor_prediction_intervals(use_covariates, ts_multiseries_data): | ||
| X_train, X_test, y_train = ts_multiseries_data(no_features=not use_covariates) |
There was a problem hiding this comment.
I think an interesting test here would be to check the cases where X is None and use_covariates is True, and where X is not None and use_covariates is False - we have lots of checks for those cases, it'd be nice to ensure we handle those smoothly
There was a problem hiding this comment.
To clarify, you mean the cases where X in fit() and not in in get_prediction_intervals() right? I can add that case in!
| # anchor represents where the simulations should start from (forecasting is done from the "end") | ||
| y_pred = self._component_obj._fitted_forecaster.simulate( | ||
| nsimulations=X.shape[0], | ||
| repetitions=400, |
There was a problem hiding this comment.
Hmm, poking around in our exponential smoother and statsmodels' docs on the subject, it's unclear to me why this was set at 400. I think at least setting it as a constant would be good, since the number seems arbitrary.
dd72b1b to
962d3e7
Compare
962d3e7 to
2890f04
Compare
Resolves #4262