Set x_ref_preprocessed=True during legacy loading (v0.10.5 patch) #732
Merged
ascillitoe merged 3 commits intoSeldonIO:patch/v0.10.5from Jan 25, 2023
ascillitoe:fix/legacy_load_preproc
Merged
Set x_ref_preprocessed=True during legacy loading (v0.10.5 patch) #732ascillitoe merged 3 commits intoSeldonIO:patch/v0.10.5from ascillitoe:fix/legacy_load_preproc
x_ref_preprocessed=True during legacy loading (v0.10.5 patch) #732ascillitoe merged 3 commits intoSeldonIO:patch/v0.10.5from
ascillitoe:fix/legacy_load_preproc
Conversation
Contributor
Author
|
The test script has been updated from #732 to actually run some sample predictions, because simply checking that loading passed missed this bug: import numpy as np
import alibi_detect
import transformers
from alibi_detect.saving import load_detector
from alibi_detect.utils.fetching.fetching import fetch_detector, fetch_state_dict, fetch_tf_model
from alibi_detect.datasets import fetch_kdd
from alibi_detect.utils.data import create_outlier_batch
from alibi_detect.utils.url import _join_url
import tensorflow as tf
import pytest
import itertools
from cloudpathlib import CloudPath
TESTS = [
# Outlier detectors
{'detector_type': 'outlier', 'detector_name': 'IForest', 'dataset': ['kddcup']},
{'detector_type': 'outlier', 'detector_name': 'LLR', 'dataset': ['fashion_mnist', 'genome']},
{'detector_type': 'outlier', 'detector_name': 'Mahalanobis', 'dataset': ['kddcup']},
{'detector_type': 'outlier', 'detector_name': 'OutlierAE', 'dataset': ['cifar10']},
{'detector_type': 'outlier', 'detector_name': 'OutlierAEGMM', 'dataset': ['kddcup']},
{'detector_type': 'outlier', 'detector_name': 'OutlierProphet', 'dataset': ['weather']},
{'detector_type': 'outlier', 'detector_name': 'OutlierSeq2Seq', 'dataset': ['ecg']},
{'detector_type': 'outlier', 'detector_name': 'OutlierVAE', 'dataset': ['adult', 'cifar10', 'kddcup']},
{'detector_type': 'outlier', 'detector_name': 'OutlierVAEGMM', 'dataset': ['kddcup']},
# Adversarial detectors
{'detector_type': 'adversarial', 'detector_name': 'model_distillation', 'dataset': ['cifar10'], 'model': ['resnet32']},
# Drift detectors (not supported by `fetch_detector`...)
{'detector_type': 'drift', 'detector_name': 'ks', 'dataset': ['cifar10', 'imdb'], 'version': ['0.6.2']},
{'detector_type': 'drift', 'detector_name': 'mmd', 'dataset': ['cifar10'], 'version': ['0.8.1']},
{'detector_type': 'drift', 'detector_name': 'tabular', 'dataset': ['income'], 'version': ['0.7.0', '0.8.1']},
]
def dict_product(dicts):
return (dict(zip(dicts, x)) for x in itertools.product(*dicts.values()))
trials = []
for test in TESTS:
for k, v in test.items():
if not isinstance(v, list):
test[k] = [v]
trials += list(dict_product(test))
n_tests = len(trials)
@pytest.fixture
def unpack_trials(request):
return trials[request.param]
@pytest.mark.parametrize("unpack_trials", list(range(n_tests)), indirect=True)
def test_fetch_detector(unpack_trials, tmp_path):
kwargs = unpack_trials
print(kwargs)
if kwargs['detector_type'] in ('outlier', 'adversarial'):
det = fetch_detector(tmp_path, **kwargs)
else:
# create url of detector
version = kwargs.get('version', '')
version = version.replace('.', '_')
url = 'gs://seldon-models/alibi-detect/'
url += 'cd/' + kwargs['detector_name'] + '/' + kwargs['dataset'] + '-' + version + '/'
# Download bucket directory
cloudpath = CloudPath(url)
cloudpath.copytree(tmp_path)
# Load detector
det = load_detector(tmp_path)
# Check loaded detector
dataset = kwargs['dataset']
detector_type = kwargs['detector_type']
det_backend = det._detector if hasattr(det, '_detector') else det
det_backend = det_backend._detector if hasattr(det_backend, '_detector') else det_backend
X = None
if dataset == 'imdb':
if detector_type == 'drift':
assert isinstance(det_backend.preprocess_fn.keywords['model'], alibi_detect.cd.tensorflow.preprocess.UAE)
assert isinstance(det_backend.preprocess_fn.keywords['tokenizer'], transformers.PreTrainedTokenizerBase)
X = ["This is one of the dumbest films, I've ever seen. It rips off nearly ever type of thriller and "
"manages to make a mess of them all.<br /><br />There's not a single good line or character in the "
"whole mess. If there was a plot, it was an afterthought and as far as acting goes, there's nothing "
"good to say so Ill say nothing. I honestly cant understand how this type of nonsense gets produced "
"and actually released, does somebody somewhere not at some stage think, 'Oh my god this really is a "
"load of shite' and call it a day. Its crap like this that has people downloading illegally, the trailer "
"looks like a completely different film, at least if you have download it, you haven't wasted your time "
"or money Don't waste your time, this is painful."]
elif dataset == 'cifar10':
if detector_type == 'drift':
assert isinstance(det_backend.preprocess_fn.keywords['model'], tf.keras.Model)
X = np.random.uniform(size=(5, 32, 32, 3)) # dummy batch of (32,32,3) images
if dataset == 'kddcup':
kddcup = fetch_kdd(percent10=True)
normal_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=5, perc_outlier=0)
X, y = normal_batch.data.astype('float'), normal_batch.target
if X is not None:
if hasattr(det, 'infer_threshold'):
det.infer_threshold(X, threshold_perc=95) # inferring on test data just for basic test...
_ = det.predict(X) |
Contributor
Author
x_ref_preprocessed=True during legacy loadingx_ref_preprocessed=True during legacy loading (v0.10.5 patch)
This was referenced Jan 26, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes a second bug in the legacy loading behavior (first one here), used when loading detectors saved in
alibi-detect < v0.10or withsave_detector(..., legacy=True)in newer versions.In short, the bug was due to the change in kwarg's related to preprocessing of
x_refbeing changed inv0.10. This was reflected in the functions that set thestate_dictused for legacy saving (e.g.state_ksdrift), so legacy save/load withinv0.10works fine. However, thestate_dictof detectors saved in older versions would of course contain old kwarg's with different meanings. The result wasx_refbeing incorrect preprocessed for a second time during the loading of detectors saved in<v0.10.Background
In the old legacy save/load code, a
state_dictis defined instate_ksdriftetc:alibi-detect/alibi_detect/utils/saving.py
Lines 401 to 424 in 687cddf
At load time, in
init_cd_ksdriftetc the nestedkwargsdict is used to instantiate the detector, whilst selected values from theothersdict are used to update attributes after instantiation.alibi-detect/alibi_detect/utils/saving.py
Lines 1654 to 1661 in 687cddf
In
<v0.10thepreprocess_x_refkwarg (which controls whetherx_refis preprocessed at init) was handled in this way, to ensure that the already preprocessedx_refwas not preprocessed again when a detector was loaded (it is set toFalseat instantiation and then updated to its real value afterward).In
>=v0.10this behavior is updated.preprocess_x_refis renamedpreprocess_at_init, andx_ref_preprocessedis introduced. Therefore, whetherx_refhas been preprocessed (and thus it shouldn't be preprocessed again) is controlled byx_ref_preprocessed, andpreprocess_at_initonly controls whether to preprocess at init or predict time. Since the preprocessing behavior is now fully controlled by the detector's kwargs (with no adhoc editing of attributes), the setting of thestate_dictis altered instate_ksdriftetc:alibi-detect/alibi_detect/saving/tensorflow/_saving.py
Lines 449 to 472 in ace0f51
Then, at load time, the
preprocess_x_refattribute no longer needs to be updated:alibi-detect/alibi_detect/saving/tensorflow/_loading.py
Lines 862 to 868 in ace0f51
Problem
The
state_ksdriftandinit_cd_ksdriftfunctions were updated in v0.10 to reflect the new preprocessing behavior, meaning legacy save/load within>v0.10versions works fine (hence the CI passes). However, oldstate_dict's generated with<v0.10do not contain ax_ref_preprocessedkwarg, and thepreprocess_at_init(still calledpreprocess_x_ref) kwarg is inotherinstead ofkwargs(because it was to be updated after instantiation). The result is that when loading detectors generated in old versions, they are loaded with the defaultx_ref_preprocessed=Falseandpreprocess_at_init=True, meaning the already preprocessedx_refis preprocessed again.Fix
The proposed fix is to first identify if the
state_dictis created in an old version by checking thatx_ref_preprocessed(doesn't) exist:If the
state_dictis "old", then:x_ref_preprocessedis set toTrue(as is done in the updatedstate_ksdriftetc).preprocess_x_refis moved fromotherstokwargsso that it is passed as a kwarg at instantiation instead of the attribute being updated afterward. (The name change topreprocess_at_initdoesn't matter since we use the decorator@deprecated_alias(preprocess_x_ref='preprocess_at_init')for all relevant detectors).