Fixed forecast period generation function for multiseries#4320
Conversation
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4320 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 357 357
Lines 39767 39867 +100
=======================================
+ Hits 39647 39747 +100
Misses 120 120
☔ View full report in Codecov by Sentry. |
7fa8ec0 to
4135939
Compare
eccabay
left a comment
There was a problem hiding this comment.
Some suggestions for simplification, let me know if they actually work though!
| coverage=coverage, | ||
| ) | ||
| trans_pred_intervals = {} | ||
| intervals_labels = list(list(pred_intervals.values())[0].keys()) |
There was a problem hiding this comment.
I had to go into the debugger and play with the code myself to figure out what this line was doing 😅 A much simpler way:
intervals_labels = pred_intervals[0].keys()
That may need to be cast to a list for later on, I didn't test fully, but either way it's more readable
There was a problem hiding this comment.
Hmm, this doesn't work since pred_intervals is a dict and this would pull the value at key 0 rather than the first value of the dictionary. Is there a better way to pull the first value?
There was a problem hiding this comment.
Ah oops, I see what I missed. I don't know of a better way to manipulate the dictionaries, but we could also just do intervals_labels = pd.DataFrame(pred_intervals).index 😂
| for key, orig_pi_values in intervals.items(): | ||
| series_id_target_name = ( | ||
| self.input_target_name + "_" + str(series_id) | ||
| ) | ||
| interval_series_pred_intervals[key][ | ||
| series_id_target_name | ||
| ] = pd.Series( | ||
| (orig_pi_values.values - residuals[series_id].values) | ||
| + trend_pred_intervals[series_id_target_name][key].values | ||
| + y[series_id_target_name].values, | ||
| index=orig_pi_values.index, | ||
| ) |
There was a problem hiding this comment.
This is a lot of repeated code with the other logical branch, which is going to make life very hard for us if we ever need to update this code. Could you abstract it out into a local helper function?
There was a problem hiding this comment.
Something like
def _get_series_intervals(intervals, residuals, trend_pred_intervals, y):
return_intervals = {}
for key, orig_pi_values in intervals.items():
return_intervals[key] = pd.Series(
(orig_pi_values.values - residuals.values)
+ trend_pred_intervals[key].values
+ y.values,
index=origin_pi_values.index
)
return return_intervals
if is_multiseries(problem_type):
for series_id, series_intervals in pred_intervals.items():
series_id_target_name = self.input_target_name + "_" + str(series_id)
interval_series_pred_intervals[series_id_target_name] = _get_series_intervals(
series_intervals,
residuals[series_id],
trend_pred_intervals[series_id_target_name],
y[series_id_target_name]
)
else:
trans_pred_intervals = _get_series_intervals(pred_intervals, residuals, trend_pred_intervals, y)
There was a problem hiding this comment.
The code I suggested does make a change to the dictionary structure for the multiseries case, which you'll have to let me know if it works or not - I swapped the intervals with series ids, to give us {series_1: {0.75_lower: <>, 0.75_upper: <>, ...}, series_2: {...}...} instead of {0.75_lower: {series_1: <>, series_2: <>, ...}, ...}
Personally, I think this would make it easier to get per-series prediction intervals, but you'll have to let me know if it's too much effort to swap things around at this point. We could also completely overhaul the data structure for this to be something actually 2D like a dataframe instead of nested dictionaries, but that might just be tech debt for the future.
There was a problem hiding this comment.
I ended up using your implementation but I tweaked it slightly. I still kept the original dictionary structure since it makes stacking each prediction interval in the end slightly easier. Let me know what you think!
| trans_pred_intervals = {} | ||
| trend_pred_intervals = self.get_component( |
There was a problem hiding this comment.
I got so confused here for a second, these names are so similar 😅 can one of them be renamed?
There was a problem hiding this comment.
trans_pred_intervals -> transformed_pred_intervals
| for interval, interval_data in series_id_interval_result.items(): | ||
| interval_series_pred_intervals[interval][ | ||
| series_id_target_name | ||
| ] = interval_data | ||
| for interval in intervals_labels: |
There was a problem hiding this comment.
the word interval has no meaning to me any more 😂
| for interval, interval_data in series_id_interval_result.items(): | ||
| interval_series_pred_intervals[interval][ | ||
| series_id_target_name | ||
| ] = interval_data | ||
| for interval in intervals_labels: | ||
| series_id_df = pd.DataFrame( | ||
| interval_series_pred_intervals[interval], | ||
| ) | ||
| stacked_pred_interval = stack_data( | ||
| data=series_id_df, | ||
| series_id_name=self.series_id, | ||
| ) | ||
| trans_pred_intervals[interval] = stacked_pred_interval |
There was a problem hiding this comment.
I've read over this like 5 times in a row and I can't figure out what the point of all of it is. It seems to be a lot of rearranging the same data in different ways? I'd love if we can make this clearer, even if it's just through comments.
There was a problem hiding this comment.
I gave variable renaming + additional comments adding a shot. Lmk if you think there's a way to additionally clarify it!
jeremyliweishih
left a comment
There was a problem hiding this comment.
LGTM - nothing to add on my end
Resolves #4323