Fix conversion of time-series into sequence for lempel_ziv_complexity #806

kempa-liehr · 2021-02-11T02:17:58Z

This pull fixes the bin boundaries such that the very first bin contains a reasonable number of time-series value.
While this fix has a nearly neglectable impact on the test cases, it significantly changes the feature value for real-world applications and improves the feature's interpretability.

This fix mainly improves the interpretability of the feature, while the complexity measure itself seems to be fairly robust against this change.

nils-braun · 2021-02-13T21:40:43Z

Thanks @kempa-liehr. I think the failing tests are unrelated (but something we should have a look into nevertheless).

MaxBenChrist · 2021-03-20T08:50:07Z

can we merge this?

MaxBenChrist · 2021-03-20T08:58:02Z

tsfresh/feature_extraction/feature_calculators.py

    inc = 1
    while ind + inc <= n:
-        # convert tu tuple to make it hashable
+        # convert to tuple in order to make it hashable


I have the feeling that this is not an efficient implementation.

If we want to count the occurrence of substrings, we can use a counter for collections (collections.Counter). If we want to count the amount of different substrings, we can just call set on the list of substrings, without the loop

But lemel_ziv isn't neither the amount of fixed-length substrings nor the counter of fixed length substrings. The idea behind it is that the length of the substrings is chosen in a way to never have any substring occuring doubled.

Example: 100111
First substring: just 1
Second: just 0
Third: would be 0 again, but we have 0 already so we need to choose 01
Forth: would be 1, but we have 1 already so we need to choose 11
End of the word -> 4 substrings / length 6 = 0.66.

The code is O(n) (the while loop needs approx n steps in in each we have a set check, which is O(1) and some other operations like addition, which are also O(1)), and I do not think we can be asymptotically better than this. There exist cython implementations of that - but as long as those are not accessible easily via conda and pip, I do not see any nice way for us.

Having some compiled (cython/numba) feature calculators would be really nice, but so far no one had time to have a look into this

thanks for the explanation nils!

nils-braun · 2021-03-20T09:39:39Z

I have just merged in the newest changes from main. Let's see if that fixed the failing tests

codecov-io · 2021-03-20T09:41:23Z

Codecov Report

Merging #806 (98bcd43) into main (26af836) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #806   +/-   ##
=======================================
  Coverage   95.92%   95.92%           
=======================================
  Files          18       18           
  Lines        1841     1841           
  Branches      362      362           
=======================================
  Hits         1766     1766           
  Misses         37       37           
  Partials       38       38

Impacted Files	Coverage Δ
tsfresh/feature_extraction/feature_calculators.py	`97.29% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26af836...98bcd43. Read the comment docs.

kempa-liehr added 2 commits February 11, 2021 09:05

Typos

354b57c

Fix: First bin only contains minimum of input signal.

cd0ce2e

This fix mainly improves the interpretability of the feature, while the complexity measure itself seems to be fairly robust against this change.

MaxBenChrist reviewed Mar 20, 2021

View reviewed changes

Merge remote-tracking branch 'origin/main' into fix_lempel_ziv

98bcd43

nils-braun merged commit cb7943e into main Mar 20, 2021

nils-braun deleted the fix_lempel_ziv branch March 20, 2021 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix conversion of time-series into sequence for lempel_ziv_complexity #806

Fix conversion of time-series into sequence for lempel_ziv_complexity #806

Uh oh!

kempa-liehr commented Feb 11, 2021

Uh oh!

nils-braun commented Feb 13, 2021

Uh oh!

MaxBenChrist commented Mar 20, 2021

Uh oh!

MaxBenChrist Mar 20, 2021

Uh oh!

nils-braun Mar 20, 2021

Uh oh!

MaxBenChrist Mar 20, 2021

Uh oh!

nils-braun commented Mar 20, 2021

Uh oh!

codecov-io commented Mar 20, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix conversion of time-series into sequence for lempel_ziv_complexity #806

Fix conversion of time-series into sequence for lempel_ziv_complexity #806

Uh oh!

Conversation

kempa-liehr commented Feb 11, 2021

Uh oh!

nils-braun commented Feb 13, 2021

Uh oh!

MaxBenChrist commented Mar 20, 2021

Uh oh!

MaxBenChrist Mar 20, 2021

Choose a reason for hiding this comment

Uh oh!

nils-braun Mar 20, 2021

Choose a reason for hiding this comment

Uh oh!

MaxBenChrist Mar 20, 2021

Choose a reason for hiding this comment

Uh oh!

nils-braun commented Mar 20, 2021

Uh oh!

codecov-io commented Mar 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-io commented Mar 20, 2021 •

edited

Loading