Add STLDecomposer to multiseries pipelines#4299
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #4299 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 357 357
Lines 39577 39587 +10
=======================================
+ Hits 39457 39467 +10
Misses 120 120
☔ View full report in Codecov by Sentry. |
| key + " : " + "{:0.2f}".format(val) | ||
| if (isinstance(val, float)) | ||
| else key + " : " + str(val) | ||
| else key + " : " + str(val).replace("{", "").replace("}", "") |
There was a problem hiding this comment.
This is hard to follow 😅 can you add an explanatory comment?
| y.append(y_series) | ||
| y_df = pd.DataFrame(y).T | ||
| y_df.index = original_index | ||
| y_df.columns = y_t.columns |
There was a problem hiding this comment.
Out of curiosity, why is this necessary? What was the situation where the columns weren't the same?
There was a problem hiding this comment.
The predictions weren't getting the corresponding series ID values as the column names and that's needed since the decomposer uses this to select the correct value. Before this was causing the decomposer to return NaN values. @christopherbunn figured that out so he might have more info.
There was a problem hiding this comment.
The predictions that are generated do not have the series ID values as their column names. Copying these names over is required so we can inverse_transform from the decomposer.
evalml/pipelines/utils.py
Outdated
| seasonal_period = STLDecomposer.determine_periodicity( | ||
| X, | ||
| y, | ||
| rel_max_order=order, | ||
| ) | ||
| if seasonal_period is not None and seasonal_period <= DECOMPOSER_PERIOD_CAP: |
There was a problem hiding this comment.
The way determine_periodicity is set up, we're currently detecting a "period" on the single stacked target data column. I'm worried that that's too brittle, it could cause weird issues in the future. Could you put this in a conditional branch to ensure we only run it in the single series case, and for now just always add the decomposer for multiseries? We'll have to come back and revisit, but that should be ok for the MVP.
Resolves #4298
Acceptance Criteria (AC)