fix: Fix builtin evaluator edge cases by anticorrelator · Pull Request #11405 · Arize-ai/phoenix

anticorrelator · 2026-02-12T23:40:10Z

Reject None, list, and dict values for string-typed template fields instead of silently coercing them (e.g. str(None) → "None")
Change case_sensitive default from True to False for ExactMatch and Levenshtein evaluators
Cap Levenshtein distance inputs at 5000 characters to prevent expensive O(n*m) computations
Add early-exit for identical strings in Levenshtein to skip unnecessary computation
Fix json_diff_count to treat int/float as equivalent (1 == 1.0) using math.isclose, and distinguish bool from int (True != 1)

Note

Medium Risk
Behavior changes in evaluator defaults and input casting/serialization can affect existing evaluation results and traces, though scope is limited to evaluator logic and covered by unit tests.

Overview
Hardens evaluator input handling by making cast_template_variable_types fail fast on None for string fields and JSON-serializing dict/list values (instead of Python str() output), which also changes LLM prompt/span inputs to use JSON strings.

Changes ExactMatchEvaluator and LevenshteinDistanceEvaluator to default case_sensitive to False, and adds guardrails to Levenshtein evaluation (5000-char length cap plus early-exit when strings already match). json_diff_count now treats int/float as numerically equivalent (via math.isclose) while distinguishing bool from int, with tests updated/added accordingly.

^{Written by Cursor Bugbot for commit 95de673. This will update automatically on new commits. Configure here.}

- Reject None, list, and dict values for string-typed template fields instead of silently coercing them (e.g. `str(None)` → `"None"`) - Change `case_sensitive` default from `True` to `False` for ExactMatch and Levenshtein evaluators - Cap Levenshtein distance inputs at 5000 characters to prevent expensive O(n*m) computations - Add early-exit for identical strings in Levenshtein to skip unnecessary computation - Fix `json_diff_count` to treat int/float as equivalent (1 == 1.0) using `math.isclose`, and distinguish bool from int (`True` != `1`)

src/phoenix/server/api/evaluators.py

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-13T07:16:05Z

src/phoenix/server/api/evaluators.py

+                if value is None:
+                    raise ValueError(f"Field '{key}' expects a string but got NoneType")
+                if isinstance(value, (dict, list)):
+                    casted_template_variables[key] = json.dumps(value, default=str)


String schema still accepts containers

Medium Severity

cast_template_variable_types still converts dict and list values for "string" fields into JSON text via json.dumps instead of rejecting them. This keeps non-string inputs silently passing validation, so template variables that are structurally wrong continue to be treated as valid strings.

cursor · 2026-02-13T07:16:05Z

src/phoenix/server/api/evaluators.py

+                if max(len(compare_expected), len(compare_actual)) > 5000:
+                    raise ValueError(
+                        "Inputs too long for Levenshtein distance (max 5000 characters)"
+                    )


Length cap blocks identical long strings

Low Severity

LevenshteinDistanceEvaluator enforces the 5000-character limit before checking compare_expected == compare_actual. This causes identical over-limit strings to return an error instead of distance 0, even though the early-exit path avoids the expensive levenshtein_distance computation.

Additional Locations (1)

src/phoenix/server/api/evaluators.py#L1934-L1938

anticorrelator requested a review from a team as a code owner February 12, 2026 23:40

github-project-automation bot added this to phoenix Feb 12, 2026

github-project-automation bot moved this to 📘 Todo in phoenix Feb 12, 2026

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 12, 2026

mintlify bot deployed to staging February 12, 2026 23:41 View deployment

cursor bot reviewed Feb 12, 2026

View reviewed changes

src/phoenix/server/api/evaluators.py Show resolved Hide resolved

fix: Improve default object loading

95de673

mintlify bot deployed to staging February 13, 2026 07:13 View deployment

cursor bot reviewed Feb 13, 2026

View reviewed changes

mikeldking assigned axiomofjoy and ehutt and unassigned axiomofjoy Mar 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix builtin evaluator edge cases#11405

fix: Fix builtin evaluator edge cases#11405
anticorrelator wants to merge 2 commits intomainfrom
dustin/fix-builtin-evaluator-correctness-and-availability-issues

anticorrelator commented Feb 12, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 13, 2026

Uh oh!

cursor bot Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anticorrelator commented Feb 12, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 13, 2026

Choose a reason for hiding this comment

String schema still accepts containers

Uh oh!

cursor bot Feb 13, 2026

Choose a reason for hiding this comment

Length cap blocks identical long strings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anticorrelator commented Feb 12, 2026 •

edited by cursor bot

Loading