Skip to content

Conversation

@nils-braun
Copy link
Collaborator

As discussed in #482, there is the need to allow developers to integrate custom feature definitions into tsfresh, without editing the feature_calculators.py file (which e.g. means you need to download and build from source).

In principle, that can be done quite easily with (a) additional arguments to extract_features or monkeypatching as shown in #482. The problem however is, that these methods either break on the boundary between processes or machines (because self-defined functions have problems when pickling) or do not work on all OSs.

With this PR, I am proposing another solution. As mentioned in the issue, cloudpickle is able to pickle and unpickle functions correctly. As multiprocessing is not using cloudpickle by default, I had to "cheat" a bit and make the settings dictionaries use cloudpickle by under the hood.

With the changes, it is now possible to create custom feature extractors "online" and add them to the settings:

from tsfresh import extract_features
from tsfresh.feature_extraction.settings import MinimalFCParameters

# This is our new feature, taken from @dbarbier in https://github.com/blue-yonder/tsfresh/issues/482
def last_n(x, n):
    return x[-n]

# now create our settings object and add our new feature calculator
# please note that the function is the key and the parameters are the values
# we are using the minimal settings, but will also work with others
settings = MinimalFCParameters()
settings[last_n] = [{"n": 1}, {"n": 2}]

# works with multiprocessing (at least on my linux)
extract_features(df, column_id="id", column_sort="time", default_fc_parameters=settings, n_jobs=4)

@codecov-commenter
Copy link

codecov-commenter commented Apr 17, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@4780b4b). Click here to learn what that means.
The diff coverage is 87.50%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #845   +/-   ##
=======================================
  Coverage        ?   95.86%           
=======================================
  Files           ?       18           
  Lines           ?     1861           
  Branches        ?      365           
=======================================
  Hits            ?     1784           
  Misses          ?       38           
  Partials        ?       39           
Impacted Files Coverage Δ
tsfresh/feature_extraction/extraction.py 92.59% <66.66%> (ø)
tsfresh/feature_extraction/settings.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4780b4b...0e113fc. Read the comment docs.

@nils-braun nils-braun merged commit 5c400f3 into main Apr 19, 2021
@nils-braun nils-braun deleted the feature/cloudpickle-settings branch April 19, 2021 20:40
@MaxBenChrist
Copy link
Collaborator

wow, really funky design,

With this PR, I am proposing another solution. As mentioned in the issue, cloudpickle is able to pickle and unpickle functions correctly. As multiprocessing is not using cloudpickle by default, I had to "cheat" a bit and make the settings dictionaries use cloudpickle by under the hood.

I wonder, are there some limitations on what kind of operations are allowed inside the function body?
What is if there is a package mismatch between the definition of the function body and inside tsfresh?

@nils-braun
Copy link
Collaborator Author

As the code is executed on the same machine with the same interpreter (in multiprocessing), that is not a problem. For multi-node setups such as Dask, one needs to think about that anywho, so this PR does not make it worse.

@vutle
Copy link

vutle commented Jun 13, 2021

This changes does not work when we run under interactive environment for job > 1, on Windows 10.

It will work running inside the main function.

@nils-braun
Copy link
Collaborator Author

Hi @vutle - oh, this is bad. But it does work for non-custom feature extractors?

@vutle
Copy link

vutle commented Jun 13, 2021 via email

@vutle
Copy link

vutle commented Jun 13, 2021

I found that if you put the custom feature extraction inside a class and you initialised this class before usage, it will work correctly in both interactive and non interactive mode for any number of no_job. Since the function is registered in the memory address.

@nils-braun
Copy link
Collaborator Author

Can you maybe share the error message you see (when it fails in parallel mode)?

@enesok
Copy link

enesok commented Oct 4, 2022

Hi,
is there a possibility to pick a subset of e.g. Comprehensivelist and add your custom features? A small example:
settings = ComprehensiveFCParameters()
settings[diff] = [{"n": 1}]
feature_list_small = [ 'minimum', 'maximum', diff]
def get_features(features_of_interest, tsfresh_list = settings):
return {key: tsfresh_list[key] for key in features_of_interest }
custom_list=get_features(feature_list_small)

extract_features with a single job stalls with this example, although there are no problems with MinimalFCParameters.

No exceptions are thrown and also the documentation of your website do not work.

Env:
Win10, Python 3.9.13, conda tsfresh 0.19.0

@nils-braun
Copy link
Collaborator Author

nils-braun commented Feb 19, 2023

Hi @enesok - sorry for the massive delay. The documentation on our website (https://tsfresh.readthedocs.io/en/latest/index.html) works for me. The settings is actually just a dictionary. If you put in only those feature calculators that you want as keys, tsfresh will only calculate those. Also see our documentation: https://tsfresh.readthedocs.io/en/latest/text/feature_extraction_settings.html (if it works now for you)

As a hint: if you comment on already closed PRs, there is a high chance people will miss it (because the topic is already closed). Best is to create a new issue in this case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants