Skip to content

Fix legacy loading of detectors with tf models (v0.10.5 patch)#729

Merged
ascillitoe merged 6 commits intoSeldonIO:patch/v0.10.5from
ascillitoe:fix/tf_model_legacy_load
Jan 25, 2023
Merged

Fix legacy loading of detectors with tf models (v0.10.5 patch)#729
ascillitoe merged 6 commits intoSeldonIO:patch/v0.10.5from
ascillitoe:fix/tf_model_legacy_load

Conversation

@ascillitoe
Copy link
Contributor

@ascillitoe ascillitoe commented Jan 23, 2023

This PR fixes a bug in the legacy load functionality (of .pickle/.dill files generated by old alibi-detect versions). The bug meant that the detectors stored in https://console.cloud.google.com/storage/browser/seldon-models/alibi-detect could not be successfully loaded, because models were expected to be loaded from filepath/encoder/model.h5, instead of filepath/model/encoder.h5.

This was not picked up in the test_saving_legacy.py tests, because these operate by instantiating a detector, saving it, and then reloading it (hence the incorrect naming convention was adopted when saving).

Testing

CI runs: https://github.com/ascillitoe/alibi-detect/actions/runs/3995538470

Backward compatibility has been tested by loading (and checking) the following test matrix of artefacts from https://console.cloud.google.com/storage/browser/seldon-models/alibi-detect (see #729 (comment)):

TESTS = [
    # Outlier detectors
    {'detector_type': 'outlier', 'detector_name': 'IForest', 'dataset': ['kddcup']},
    {'detector_type': 'outlier', 'detector_name': 'LLR', 'dataset': ['fashion_mnist', 'genome']},
    {'detector_type': 'outlier', 'detector_name': 'Mahalanobis', 'dataset': ['kddcup']},
    {'detector_type': 'outlier', 'detector_name': 'OutlierAE', 'dataset': ['cifar10']},
    {'detector_type': 'outlier', 'detector_name': 'OutlierAEGMM', 'dataset': ['kddcup']},
    {'detector_type': 'outlier', 'detector_name': 'OutlierProphet', 'dataset': ['weather']},
    {'detector_type': 'outlier', 'detector_name': 'OutlierSeq2Seq', 'dataset': ['ecg']},
    {'detector_type': 'outlier', 'detector_name': 'OutlierVAE', 'dataset': ['adult', 'cifar10', 'kddcup']},
    {'detector_type': 'outlier', 'detector_name': 'OutlierVAEGMM', 'dataset': ['kddcup']},
    # Adversarial detectors
    {'detector_type': 'adversarial', 'detector_name': 'model_distillation', 'dataset': ['cifar10'], 'model': ['resnet32']},
    # Drift detectors (not supported by `fetch_detector`...)
    {'detector_type': 'drift', 'detector_name': 'ks', 'dataset': ['cifar10', 'imdb'], 'version': ['0.6.2']},
    {'detector_type': 'drift', 'detector_name': 'mmd', 'dataset': ['cifar10'], 'version': ['0.8.1']},
    {'detector_type': 'drift', 'detector_name': 'tabular', 'dataset': ['income'], 'version': ['0.7.0', '0.8.1']},
]

where {'detector_type': 'outlier', 'detector_name': 'IForest', 'dataset': ['kddcup']} corresponds to seldon-models/alibi-detect/od/IForest/kddcup, and {'detector_type': 'drift', 'detector_name': 'ks', 'dataset': ['cifar10', 'imdb'], 'version': ['0.6.2']} to seldon-models/alibi-detect/cd/ks/imdb-0_6_2.

Results

All the above artifacts are loaded successfully except for {'detector_type': 'outlier', 'detector_name': 'LLR', 'dataset': 'genome'} which fails with:

AttributeError: Can't get attribute 'likelihood_fn' on <module '__main__`

This artifact is very old. It is saved as .pickle instead of .dill, and the likelihood_fn defined in https://docs.seldon.io/projects/alibi-detect/en/stable/examples/od_llr_genome.html is not found at load time.

Note: artifacts with version numbers <v0.6 are not included in TESTS since they use .pickle, and objects they reference such as preprocess_drift have since been moved to different locations in alibi_detect.

@ascillitoe ascillitoe changed the title Fix legacy loading of tf models Fix legacy loading of detectors with tf models Jan 23, 2023
@ascillitoe ascillitoe added the WIP PR is a Work in Progress label Jan 23, 2023

def load_model(filepath: Union[str, os.PathLike],
load_dir: str = 'model',
filename: str = 'model',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_dir is removed, and any subdirectory (i.e. model/) is just added to filepath before passing to load_model.

filename is added instead, since the legacy saving used to save model's with various filenames such as encoder.h5 or model.h5.

@ascillitoe
Copy link
Contributor Author

ascillitoe commented Jan 24, 2023

A basic test script to test loading of artifacts in https://console.cloud.google.com/storage/browser/seldon-models is included below. We might want to have a separate conversation about how best to generate new artifacts, and test new (and old) ones.

from alibi_detect.saving import load_detector
from alibi_detect.utils.fetching.fetching import fetch_detector, fetch_state_dict, fetch_tf_model
from alibi_detect.utils.url import _join_url
import tensorflow as tf
import pytest
import itertools
from cloudpathlib import CloudPath


TESTS = [
    # Outlier detectors
    {'detector_type': 'outlier', 'detector_name': 'IForest', 'dataset': ['kddcup']},
    {'detector_type': 'outlier', 'detector_name': 'LLR', 'dataset': ['fashion_mnist', 'genome']},
    {'detector_type': 'outlier', 'detector_name': 'Mahalanobis', 'dataset': ['kddcup']},
    {'detector_type': 'outlier', 'detector_name': 'OutlierAE', 'dataset': ['cifar10']},
    {'detector_type': 'outlier', 'detector_name': 'OutlierAEGMM', 'dataset': ['kddcup']},
    {'detector_type': 'outlier', 'detector_name': 'OutlierProphet', 'dataset': ['weather']},
    {'detector_type': 'outlier', 'detector_name': 'OutlierSeq2Seq', 'dataset': ['ecg']},
    {'detector_type': 'outlier', 'detector_name': 'OutlierVAE', 'dataset': ['adult', 'cifar10', 'kddcup']},
    {'detector_type': 'outlier', 'detector_name': 'OutlierVAEGMM', 'dataset': ['kddcup']},
    # Adversarial detectors
    {'detector_type': 'adversarial', 'detector_name': 'model_distillation', 'dataset': ['cifar10'], 'model': ['resnet32']},
    # Drift detectors (not supported by `fetch_detector`...)
    {'detector_type': 'drift', 'detector_name': 'ks', 'dataset': ['cifar10', 'imdb'], 'version': ['0.6.2']},
    {'detector_type': 'drift', 'detector_name': 'mmd', 'dataset': ['cifar10'], 'version': ['0.8.1']},
    {'detector_type': 'drift', 'detector_name': 'tabular', 'dataset': ['income'], 'version': ['0.7.0', '0.8.1']},
]

def dict_product(dicts):
    return (dict(zip(dicts, x)) for x in itertools.product(*dicts.values()))


trials = []
for test in TESTS:
    for k, v in test.items():
        if not isinstance(v, list):
            test[k] = [v]
    trials += list(dict_product(test))
n_tests = len(trials)


@pytest.fixture
def unpack_trials(request):
    return trials[request.param]


@pytest.mark.parametrize("unpack_trials", list(range(n_tests)), indirect=True)
def test_fetch_detector(unpack_trials, tmp_path):
    kwargs = unpack_trials
    print(kwargs)
    if kwargs['detector_type'] in ('outlier', 'adversarial'):
        _ = fetch_detector(tmp_path, **kwargs)

    else:
        # create url of detector
        version = kwargs.get('version', '')
        version = version.replace('.', '_')
        url = 'gs://seldon-models/alibi-detect/'
        url += 'cd/' + kwargs['detector_name'] + '/' + kwargs['dataset'] + '-' + version + '/'

        # Download bucket directory
        cloudpath = CloudPath(url)
        cloudpath.copytree(tmp_path)

        dd = load_detector(tmp_path)
        dd = dd._detector if hasattr(dd, '_detector') else dd
        dd = dd._detector if hasattr(dd, '_detector') else dd
        if kwargs['detector_name'] != 'tabular':
            assert dd.preprocess_fn is not None

@ascillitoe ascillitoe removed the WIP PR is a Work in Progress label Jan 24, 2023
@ascillitoe ascillitoe requested review from jklaise and mauicv January 24, 2023 10:01
@ascillitoe ascillitoe changed the title Fix legacy loading of detectors with tf models Fix legacy loading of detectors with tf models (v0.10.5 patch) Jan 24, 2023
def save_model(model: tf.keras.Model,
filepath: Union[str, os.PathLike],
save_dir: Union[str, os.PathLike] = 'model',
filename: str = 'model',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saving must also be updated since save_model was saving using an incorrect naming convention. i.e. everything was saved to model.h5 instead of allowing encoder.h5 etc

@mauicv
Copy link
Contributor

mauicv commented Jan 24, 2023

LGTM, is there any reason we don't include the test for remote artefacts? Perhaps we could only have it run on nightly CI? Maybe this requires more thought though?

@ascillitoe
Copy link
Contributor Author

LGTM, is there any reason we don't include the test for remote artefacts? Perhaps we could only have it run on nightly CI? Maybe this requires more thought though?

Yeh we should include something IMO. But I think we need to have more of a think about:

  • How these artifacts are generated in the first place.
  • What exactly we are testing i.e. just the latest artifacts, or test loading of previous versions too (to discover backward compatibility issues like the one fixed here).
  • Where this testing code sits (in the main alibi-detect repo or elsewhere). Some of the artifacts are pulled in by the alibi_detect fetch_detector function, but some are used elsewhere such as seldon-core. As an aside, fetch_detector doesn't support any drift detectors whatsoever...

Copy link
Contributor

@jklaise jklaise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ascillitoe ascillitoe merged commit b7a3b41 into SeldonIO:patch/v0.10.5 Jan 25, 2023
@ascillitoe ascillitoe deleted the fix/tf_model_legacy_load branch January 25, 2023 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants