Extend STLDecomposer to Support Multiseries by remyogasawara · Pull Request #4253 · alteryx/evalml

remyogasawara · 2023-07-26T19:04:26Z

Resolves #4244

codecov · 2023-07-31T22:50:33Z

Codecov Report

Patch coverage: 100.0% and project coverage change: +0.1% 🎉

Comparison is base (90033c5) 99.7% compared to head (3cc6cf3) 99.7%.

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #4253     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        355     355             
  Lines      39155   39458    +303     
=======================================
+ Hits       39035   39338    +303     
  Misses       120     120

Files Changed	Coverage Δ
...omponents/transformers/preprocessing/decomposer.py	`99.4% <100.0%> (+0.1%)`	⬆️
...nents/transformers/preprocessing/stl_decomposer.py	`100.0% <100.0%> (ø)`
...omponent_tests/decomposer_tests/test_decomposer.py	`100.0% <100.0%> (ø)`
...sts/decomposer_tests/test_polynomial_decomposer.py	`100.0% <100.0%> (ø)`
...nent_tests/decomposer_tests/test_stl_decomposer.py	`100.0% <100.0%> (ø)`
evalml/tests/conftest.py	`98.4% <100.0%> (+0.1%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

* Add unstacking function * Add stacking function * Add tests for both functions

* Squashed changes * Ignored index * Disabled column checking * Reverted deleted code * Updated pyproject.toml * Replaced version check code

christopherbunn

Some smaller nits. Only reviewed implemetation so far but will review tests later.

evalml/pipelines/components/transformers/preprocessing/decomposer.py

evalml/pipelines/components/transformers/preprocessing/stl_decomposer.py

eccabay

Making awesome progress! I left a few more comments, mostly just condensing code

evalml/pipelines/components/transformers/preprocessing/decomposer.py

eccabay · 2023-08-21T14:02:49Z

evalml/pipelines/components/transformers/preprocessing/stl_decomposer.py

+            series_y = y[id]
+
+            # Determine the period of the seasonal component
+            if id not in self.periods or self.period is None:


A very small use case, but users should be able to pass in self.periods when initializing the decomposer, rather than necessarily defining it here (our detection is decent, but if a user knows what the periods should be, they should be able to pass that in for better results)

And unless I'm missing something, it doesn't look like we actually use self.period anywhere any more, because even in the single series case, we're still indexing into self.periods. Am I correct? If so, I think we should just swap out one for the other entirely.

Yeah, I should have taken self.period out. I think I kept it to keep the functionality of set_period the same, but is the function used in the PolynomialDecomposer at all because I see that it is tested on both decomposers in test_decomposer_set_period. Also would the periods parameter be same for the single series case or would it be fine for period to be an integer for single series and then take in a dictionary for multiseries?

set_period is tested on both decomposers because it's implemented in the parent class, since we have to test its implementation on one subclass, why not test it on both? Feel free to bastardize whatever preexisting functions you need to - I could totally see the case for eliminating set_period entirely and just calling _determine_periodicity directly, or having set_period calculate the periods for all series and save them in self.periods instead of self.period. It's up to you, IMO

In order to allow users to still pass in a period for both multivariate and univariate cases, I was thinking of keeping both self.period and self.periods as parameters, but that probably isn't the most efficient implementation so I'm open to other suggestions 😅

evalml/pipelines/components/transformers/preprocessing/stl_decomposer.py

evalml/tests/conftest.py

evalml/tests/component_tests/decomposer_tests/test_decomposer.py

evalml/tests/component_tests/decomposer_tests/test_stl_decomposer.py

eccabay

A few final comments, but nothing blocking. Awesome work!

eccabay · 2023-08-25T12:50:52Z

evalml/pipelines/components/transformers/preprocessing/stl_decomposer.py

+                if self.period is None and len(y.columns) == 1:
+                    self.period = period
+                    self.update_parameters({"period": self.period})
+                elif self.period is not None and len(y.columns) == 1:
+                    period = self.period


IMO, it's ok if we call self.update_parameters an extra time - we can collapse these into a single if len(y.columns)==1

Since everything is accessing the period through self.periods I'm thinking we might not even need to update self.period at all, instead just use it to set period if its given by a user

evalml/pipelines/components/transformers/preprocessing/stl_decomposer.py

evalml/tests/component_tests/decomposer_tests/test_decomposer.py

evalml/tests/component_tests/decomposer_tests/test_stl_decomposer.py

eccabay · 2023-08-25T19:35:40Z

evalml/pipelines/components/transformers/preprocessing/stl_decomposer.py

+            y = y.to_frame()
+        series_results = {}
+        # Iterate through each series id
+        for id in y.columns:


This is so late in the process to realize this, but do we actually need to be calling _decompose_target in a loop? Since the decomposer handles multiseries generally, it should be able to handle multiseries here too, we can pass self.periods through instead of period=period, and switch around the logic to return the data in the format we need? That will prevent us from calling Decomposer.fit() too many times, which can be slow

I modified this so that get_trend_dataframe returns a dictionary for multiseries, but for single series it is still returning a list of dataframes. In the PolynomialDecomposer, get_trend_dataframe returns a list of dataframes. I think this is because there is some multivariate implementation written here. Since they're both using plot_decomposition for single series, I wanted to keep the indexing the same so I left it as a list for STLDecomposer for now (even though it can be modified later to just return a dataframe). This is pretty inconsistent so I wanted to see what others think about it and if I should consider changing the return type of get_trend_dataframe for the multiseries and/or single series STLDecomposer.

I updated get_trend_dataframe() in the STLDecomposer to return list(pd.DataFrame) for single series and dict(list(pd.DataFrame)) for multiseries, but it should be updated in the future to no longer be in a list #4294

christopherbunn

LGTM once the test case Becca mentioned is covered!

remyogasawara added 12 commits July 20, 2023 16:42

initial commit

929ac9b

creates multiple graphs

fedca59

able to graph decomp

a0e4a39

graph individually

88f7d67

clean up

9f4a0d5

set period and freq

9f82dd3

modify transformer and groups

7d1a204

use dictionary instead of list

d87f007

pass components test and fix ww

c43b860

check if multiseris variable

3025b20

extend stldecomposer for multiseries

c7f1edd

add null checks

be2dd2d

eccabay and others added 3 commits August 2, 2023 16:32

Add stacking and unstacking utils for multiseries (#4250)

c2b60ac

* Add unstacking function * Add stacking function * Add tests for both functions

Add support for pandas 2 (#4216)

781c139

* Squashed changes * Ignored index * Disabled column checking * Reverted deleted code * Updated pyproject.toml * Replaced version check code

reset condition for period

4a8cc0f

remyogasawara force-pushed the 4244_extend_stldecomp_for_multiseries branch from 1faf23d to 4a8cc0f Compare August 2, 2023 23:37

remyogasawara and others added 13 commits August 2, 2023 16:44

Merge branch 'main' into 4244_extend_stldecomp_for_multiseries

68566b6

take dataframe as y input and fix indexing

a8c2445

fix lint

13c5d29

formatting

354c95a

remove print statement

a506101

pd 2 support

5745a10

update inverse_transform and get_trend_dataframe

3363e10

update get_trend_prediction_intervals and add multiseries tests

1768261

update index in plotting instead

e329195

add ms seasonal data

61d1a18

subset test remaining

d3c9468

add multiseries tests

58cd094

fix univariate tests

b68fda8

christopherbunn suggested changes Aug 18, 2023

View reviewed changes

remyogasawara and others added 4 commits August 18, 2023 09:41

fix data types and duplicate lines

2f32f5b

Merge branch 'main' into 4244_extend_stldecomp_for_multiseries

e889dcb

remove stuff from loops

02733c3

Merge branch 'main' into 4244_extend_stldecomp_for_multiseries

70e34c8

remyogasawara requested a review from eccabay August 18, 2023 21:30

eccabay reviewed Aug 21, 2023

View reviewed changes

remyogasawara and others added 7 commits August 21, 2023 17:10

condense code

09e31f2

periods parameters

837fc79

Merge branch 'main' into 4244_extend_stldecomp_for_multiseries

8e379c1

add unstacking

3677345

fix import

fd99c53

add unstacking and test

dd7b74f

remove comments

d8e23e0

remyogasawara requested review from MichaelFu512, christopherbunn and eccabay August 23, 2023 16:20

eccabay approved these changes Aug 25, 2023

View reviewed changes

eccabay reviewed Aug 25, 2023

View reviewed changes

remyogasawara and others added 5 commits August 25, 2023 13:52

update periods and tests

ecbfe86

Merge branch 'main' into 4244_extend_stldecomp_for_multiseries

eb961d6

set y index

cee84ea

simplify get_trend_dataframe

aa5a95a

Merge branch 'main' into 4244_extend_stldecomp_for_multiseries

730f409

christopherbunn approved these changes Aug 30, 2023

View reviewed changes

remyogasawara and others added 3 commits August 30, 2023 13:37

change type to dict(list(df))

b9e9d45

Merge branch 'main' into 4244_extend_stldecomp_for_multiseries

28de27f

update notes

3cc6cf3

remyogasawara merged commit 69344b2 into main Aug 31, 2023

remyogasawara deleted the 4244_extend_stldecomp_for_multiseries branch August 31, 2023 21:56

Conversation

remyogasawara commented Jul 26, 2023

Uh oh!

codecov bot commented Jul 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

christopherbunn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eccabay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eccabay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christopherbunn left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Jul 31, 2023 •

edited

Loading