Fix legacy loading of detectors with tf models (v0.10.5 patch)#729
Fix legacy loading of detectors with tf models (v0.10.5 patch)#729ascillitoe merged 6 commits intoSeldonIO:patch/v0.10.5from ascillitoe:fix/tf_model_legacy_load
Conversation
|
|
||
| def load_model(filepath: Union[str, os.PathLike], | ||
| load_dir: str = 'model', | ||
| filename: str = 'model', |
There was a problem hiding this comment.
load_dir is removed, and any subdirectory (i.e. model/) is just added to filepath before passing to load_model.
filename is added instead, since the legacy saving used to save model's with various filenames such as encoder.h5 or model.h5.
|
A basic test script to test loading of artifacts in https://console.cloud.google.com/storage/browser/seldon-models is included below. We might want to have a separate conversation about how best to generate new artifacts, and test new (and old) ones. from alibi_detect.saving import load_detector
from alibi_detect.utils.fetching.fetching import fetch_detector, fetch_state_dict, fetch_tf_model
from alibi_detect.utils.url import _join_url
import tensorflow as tf
import pytest
import itertools
from cloudpathlib import CloudPath
TESTS = [
# Outlier detectors
{'detector_type': 'outlier', 'detector_name': 'IForest', 'dataset': ['kddcup']},
{'detector_type': 'outlier', 'detector_name': 'LLR', 'dataset': ['fashion_mnist', 'genome']},
{'detector_type': 'outlier', 'detector_name': 'Mahalanobis', 'dataset': ['kddcup']},
{'detector_type': 'outlier', 'detector_name': 'OutlierAE', 'dataset': ['cifar10']},
{'detector_type': 'outlier', 'detector_name': 'OutlierAEGMM', 'dataset': ['kddcup']},
{'detector_type': 'outlier', 'detector_name': 'OutlierProphet', 'dataset': ['weather']},
{'detector_type': 'outlier', 'detector_name': 'OutlierSeq2Seq', 'dataset': ['ecg']},
{'detector_type': 'outlier', 'detector_name': 'OutlierVAE', 'dataset': ['adult', 'cifar10', 'kddcup']},
{'detector_type': 'outlier', 'detector_name': 'OutlierVAEGMM', 'dataset': ['kddcup']},
# Adversarial detectors
{'detector_type': 'adversarial', 'detector_name': 'model_distillation', 'dataset': ['cifar10'], 'model': ['resnet32']},
# Drift detectors (not supported by `fetch_detector`...)
{'detector_type': 'drift', 'detector_name': 'ks', 'dataset': ['cifar10', 'imdb'], 'version': ['0.6.2']},
{'detector_type': 'drift', 'detector_name': 'mmd', 'dataset': ['cifar10'], 'version': ['0.8.1']},
{'detector_type': 'drift', 'detector_name': 'tabular', 'dataset': ['income'], 'version': ['0.7.0', '0.8.1']},
]
def dict_product(dicts):
return (dict(zip(dicts, x)) for x in itertools.product(*dicts.values()))
trials = []
for test in TESTS:
for k, v in test.items():
if not isinstance(v, list):
test[k] = [v]
trials += list(dict_product(test))
n_tests = len(trials)
@pytest.fixture
def unpack_trials(request):
return trials[request.param]
@pytest.mark.parametrize("unpack_trials", list(range(n_tests)), indirect=True)
def test_fetch_detector(unpack_trials, tmp_path):
kwargs = unpack_trials
print(kwargs)
if kwargs['detector_type'] in ('outlier', 'adversarial'):
_ = fetch_detector(tmp_path, **kwargs)
else:
# create url of detector
version = kwargs.get('version', '')
version = version.replace('.', '_')
url = 'gs://seldon-models/alibi-detect/'
url += 'cd/' + kwargs['detector_name'] + '/' + kwargs['dataset'] + '-' + version + '/'
# Download bucket directory
cloudpath = CloudPath(url)
cloudpath.copytree(tmp_path)
dd = load_detector(tmp_path)
dd = dd._detector if hasattr(dd, '_detector') else dd
dd = dd._detector if hasattr(dd, '_detector') else dd
if kwargs['detector_name'] != 'tabular':
assert dd.preprocess_fn is not None |
| def save_model(model: tf.keras.Model, | ||
| filepath: Union[str, os.PathLike], | ||
| save_dir: Union[str, os.PathLike] = 'model', | ||
| filename: str = 'model', |
There was a problem hiding this comment.
Saving must also be updated since save_model was saving using an incorrect naming convention. i.e. everything was saved to model.h5 instead of allowing encoder.h5 etc
|
LGTM, is there any reason we don't include the test for remote artefacts? Perhaps we could only have it run on nightly CI? Maybe this requires more thought though? |
Yeh we should include something IMO. But I think we need to have more of a think about:
|
This PR fixes a bug in the legacy load functionality (of
.pickle/.dillfiles generated by oldalibi-detectversions). The bug meant that the detectors stored in https://console.cloud.google.com/storage/browser/seldon-models/alibi-detect could not be successfully loaded, because models were expected to be loaded fromfilepath/encoder/model.h5, instead offilepath/model/encoder.h5.This was not picked up in the
test_saving_legacy.pytests, because these operate by instantiating a detector, saving it, and then reloading it (hence the incorrect naming convention was adopted when saving).Testing
CI runs: https://github.com/ascillitoe/alibi-detect/actions/runs/3995538470
Backward compatibility has been tested by loading (and checking) the following test matrix of artefacts from https://console.cloud.google.com/storage/browser/seldon-models/alibi-detect (see #729 (comment)):
where
{'detector_type': 'outlier', 'detector_name': 'IForest', 'dataset': ['kddcup']}corresponds toseldon-models/alibi-detect/od/IForest/kddcup, and{'detector_type': 'drift', 'detector_name': 'ks', 'dataset': ['cifar10', 'imdb'], 'version': ['0.6.2']}toseldon-models/alibi-detect/cd/ks/imdb-0_6_2.Results
All the above artifacts are loaded successfully except for
{'detector_type': 'outlier', 'detector_name': 'LLR', 'dataset': 'genome'}which fails with:This artifact is very old. It is saved as
.pickleinstead of.dill, and thelikelihood_fndefined inhttps://docs.seldon.io/projects/alibi-detect/en/stable/examples/od_llr_genome.htmlis not found at load time.Note: artifacts with version numbers
<v0.6are not included inTESTSsince they use.pickle, and objects they reference such aspreprocess_drifthave since been moved to different locations inalibi_detect.