Fixed AnchorTabular length discrepancy between feature and names field.#902
Merged
jklaise merged 1 commit intoSeldonIO:masterfrom Apr 17, 2023
Merged
Conversation
Contributor
|
Nice! Thanks also for the thorough explanation. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes the
AnchorTabularlength discrepancy between thefeatureandnamesfiled returned in the explanation object. To describe what caused the issue, let us consider the following example.Consider that the dataset has a numerical feature
f. BecauseAnchorscan only handle discrete data, a discretization step is required for numerical features. In our examples, we discretize the numerical values based on the 25, 50, 75% quantiles. Lett25,t50,t75be the associated quantile values. This results in a discretization of the numerical featurefin 4 bins:[-inf, t25],[t25, t50],[t50, t75], and[t75, +inf], encoded by 0, 1, 2, and 3, respectively.Let us consider that we want to explain an instance
X, and let us denoteX[f]the feature value offfor the instanceX. Assume thatX[f]falls in bin number 2, thus being encoded by the value 2.For numerical features, the
AnchorTabularalgorithm creates multiple predicates associated with the same featuref. Those predicates correspond to intervals from which numerical samples can be drawn for the perturbation step in the algorithm. The code for this can be seen here. In our case the following predicates will be created:P1 = [1, 2, 3],P2 = [2, 3],P3 = [0, 1, 2]Note that each predicate
Picorresponds to an interval to from which we can sample values for the featuref. For exampleP1will be associated with the interval[t25, +inf],P2with[t50, +inf], andP3with[-inf, t75].It is possible that the final anchor can contain multiple predicates form the three
Pi's we listed above. Let us assume that it ends up containingP1andP2. With this assumption let us move to the construction of the human interpretable representaion of the anchor implemented here.Let's say that the the anchor is composed of three predicates encoded by
[1, 2, 3], where1is associtated to a featuregdifferent thanf, and2,3correspond to predicatesP1,P2associtated to featuref.Following the code line be line we have:
We already see at this point that the length of the
explanation['feature']differs from the length of the keys inordinal_ranges, becauseexplanation['feature']contains a duplicate off.The following block of code perform a correct intersection and refinement of the intervals for each feature in the anchor:
Finally, the human interpretable representation of the anchor for numerical features is constructed here based on the dictionary
ordinal_ranges.Note that the
explanation['names']filed avoids the duplication of the same feature, hence the difference in length with theexplanation['feature'].The way to fix this issue is to set the
explanation[names]to the keys list inordinal_ranges.