Fixed forecast period generation function for multiseries by christopherbunn · Pull Request #4320 · alteryx/evalml

christopherbunn · 2023-09-22T14:29:19Z

Resolves #4323

codecov · 2023-09-22T14:42:04Z

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (5c3e832) 99.7% compared to head (432d632) 99.7%.

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #4320     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        357     357             
  Lines      39767   39867    +100     
=======================================
+ Hits       39647   39747    +100     
  Misses       120     120

Files	Coverage Δ
...valml/pipelines/multiseries_regression_pipeline.py	`100.0% <100.0%> (ø)`
...valml/pipelines/time_series_regression_pipeline.py	`100.0% <100.0%> (ø)`
...line_tests/test_multiseries_regression_pipeline.py	`100.0% <100.0%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

eccabay

Some suggestions for simplification, let me know if they actually work though!

eccabay · 2023-09-29T12:51:11Z

                coverage=coverage,
            )
-            trans_pred_intervals = {}
+            intervals_labels = list(list(pred_intervals.values())[0].keys())


I had to go into the debugger and play with the code myself to figure out what this line was doing 😅 A much simpler way:

intervals_labels = pred_intervals[0].keys()

That may need to be cast to a list for later on, I didn't test fully, but either way it's more readable

Hmm, this doesn't work since pred_intervals is a dict and this would pull the value at key 0 rather than the first value of the dictionary. Is there a better way to pull the first value?

Ah oops, I see what I missed. I don't know of a better way to manipulate the dictionaries, but we could also just do intervals_labels = pd.DataFrame(pred_intervals).index 😂

eccabay · 2023-09-29T13:29:33Z

+                    for key, orig_pi_values in intervals.items():
+                        series_id_target_name = (
+                            self.input_target_name + "_" + str(series_id)
+                        )
+                        interval_series_pred_intervals[key][
+                            series_id_target_name
+                        ] = pd.Series(
+                            (orig_pi_values.values - residuals[series_id].values)
+                            + trend_pred_intervals[series_id_target_name][key].values
+                            + y[series_id_target_name].values,
+                            index=orig_pi_values.index,
+                        )


This is a lot of repeated code with the other logical branch, which is going to make life very hard for us if we ever need to update this code. Could you abstract it out into a local helper function?

Something like

def _get_series_intervals(intervals, residuals, trend_pred_intervals, y): return_intervals = {} for key, orig_pi_values in intervals.items(): return_intervals[key] = pd.Series( (orig_pi_values.values - residuals.values) + trend_pred_intervals[key].values + y.values, index=origin_pi_values.index ) return return_intervals if is_multiseries(problem_type): for series_id, series_intervals in pred_intervals.items(): series_id_target_name = self.input_target_name + "_" + str(series_id) interval_series_pred_intervals[series_id_target_name] = _get_series_intervals( series_intervals, residuals[series_id], trend_pred_intervals[series_id_target_name], y[series_id_target_name] ) else: trans_pred_intervals = _get_series_intervals(pred_intervals, residuals, trend_pred_intervals, y)

The code I suggested does make a change to the dictionary structure for the multiseries case, which you'll have to let me know if it works or not - I swapped the intervals with series ids, to give us {series_1: {0.75_lower: <>, 0.75_upper: <>, ...}, series_2: {...}...} instead of {0.75_lower: {series_1: <>, series_2: <>, ...}, ...}
Personally, I think this would make it easier to get per-series prediction intervals, but you'll have to let me know if it's too much effort to swap things around at this point. We could also completely overhaul the data structure for this to be something actually 2D like a dataframe instead of nested dictionaries, but that might just be tech debt for the future.

I ended up using your implementation but I tweaked it slightly. I still kept the original dictionary structure since it makes stacking each prediction interval in the end slightly easier. Let me know what you think!

eccabay · 2023-09-29T15:53:19Z

+            trans_pred_intervals = {}
            trend_pred_intervals = self.get_component(


I got so confused here for a second, these names are so similar 😅 can one of them be renamed?

trans_pred_intervals -> transformed_pred_intervals

eccabay · 2023-09-29T15:54:40Z

+                    for interval, interval_data in series_id_interval_result.items():
+                        interval_series_pred_intervals[interval][
+                            series_id_target_name
+                        ] = interval_data
+                for interval in intervals_labels:


the word interval has no meaning to me any more 😂

eccabay · 2023-09-29T15:59:32Z

+                    for interval, interval_data in series_id_interval_result.items():
+                        interval_series_pred_intervals[interval][
+                            series_id_target_name
+                        ] = interval_data
+                for interval in intervals_labels:
+                    series_id_df = pd.DataFrame(
+                        interval_series_pred_intervals[interval],
+                    )
+                    stacked_pred_interval = stack_data(
+                        data=series_id_df,
+                        series_id_name=self.series_id,
+                    )
+                    trans_pred_intervals[interval] = stacked_pred_interval


I've read over this like 5 times in a row and I can't figure out what the point of all of it is. It seems to be a lot of rearranging the same data in different ways? I'd love if we can make this clearer, even if it's just through comments.

I gave variable renaming + additional comments adding a shot. Lmk if you think there's a way to additionally clarify it!

jeremyliweishih

LGTM - nothing to add on my end

christopherbunn changed the title ~~Updated multiseries time series regression forecasting period function~~ Fixed multiseries time series regression forecasting period generation Sep 22, 2023

christopherbunn changed the title ~~Fixed multiseries time series regression forecasting period generation~~ Fixed forecast period generation function for multiseries Sep 22, 2023

christopherbunn added 4 commits September 22, 2023 12:54

Initial commit

27ff930

Update release notes

51c8df3

Try extra debug

c99c315

Add additional unstack for pred_intervals

4135939

christopherbunn force-pushed the msts_get_forecast_period branch from 7fa8ec0 to 4135939 Compare September 26, 2023 18:24

christopherbunn added 3 commits September 27, 2023 12:42

Move predict code

41c3944

Fix stl interval code

8200319

Code cleanup

068d0d4

christopherbunn marked this pull request as ready for review September 28, 2023 20:38

auto-assign Bot assigned christopherbunn Sep 28, 2023

christopherbunn requested review from MichaelFu512, chukarsten, eccabay and jeremyliweishih September 28, 2023 20:39

eccabay suggested changes Sep 29, 2023

View reviewed changes

Cleaned up implementation

a0c5304

christopherbunn requested a review from eccabay September 29, 2023 15:43

eccabay reviewed Sep 29, 2023

View reviewed changes

Renamed variables

432d632

christopherbunn requested a review from eccabay September 29, 2023 18:04

eccabay approved these changes Sep 29, 2023

View reviewed changes

jeremyliweishih approved these changes Sep 29, 2023

View reviewed changes

christopherbunn merged commit da17fae into main Sep 29, 2023

christopherbunn deleted the msts_get_forecast_period branch September 29, 2023 19:08

		trans_pred_intervals = {}
		trend_pred_intervals = self.get_component(

Conversation

christopherbunn commented Sep 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Sep 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eccabay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eccabay Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremyliweishih left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

christopherbunn commented Sep 22, 2023 •

edited

Loading

codecov Bot commented Sep 22, 2023 •

edited

Loading

eccabay Sep 29, 2023 •

edited

Loading