Conversation
- Reject None, list, and dict values for string-typed template fields instead of silently coercing them (e.g. `str(None)` → `"None"`) - Change `case_sensitive` default from `True` to `False` for ExactMatch and Levenshtein evaluators - Cap Levenshtein distance inputs at 5000 characters to prevent expensive O(n*m) computations - Add early-exit for identical strings in Levenshtein to skip unnecessary computation - Fix `json_diff_count` to treat int/float as equivalent (1 == 1.0) using `math.isclose`, and distinguish bool from int (`True` != `1`)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| if value is None: | ||
| raise ValueError(f"Field '{key}' expects a string but got NoneType") | ||
| if isinstance(value, (dict, list)): | ||
| casted_template_variables[key] = json.dumps(value, default=str) |
There was a problem hiding this comment.
String schema still accepts containers
Medium Severity
cast_template_variable_types still converts dict and list values for "string" fields into JSON text via json.dumps instead of rejecting them. This keeps non-string inputs silently passing validation, so template variables that are structurally wrong continue to be treated as valid strings.
| if max(len(compare_expected), len(compare_actual)) > 5000: | ||
| raise ValueError( | ||
| "Inputs too long for Levenshtein distance (max 5000 characters)" | ||
| ) |
There was a problem hiding this comment.
Length cap blocks identical long strings
Low Severity
LevenshteinDistanceEvaluator enforces the 5000-character limit before checking compare_expected == compare_actual. This causes identical over-limit strings to return an error instead of distance 0, even though the early-exit path avoids the expensive levenshtein_distance computation.


str(None)→"None")case_sensitivedefault fromTruetoFalsefor ExactMatch and Levenshtein evaluatorsjson_diff_countto treat int/float as equivalent (1 == 1.0) usingmath.isclose, and distinguish bool from int (True!=1)Note
Medium Risk
Behavior changes in evaluator defaults and input casting/serialization can affect existing evaluation results and traces, though scope is limited to evaluator logic and covered by unit tests.
Overview
Hardens evaluator input handling by making
cast_template_variable_typesfail fast onNonefor string fields and JSON-serializingdict/listvalues (instead of Pythonstr()output), which also changes LLM prompt/span inputs to use JSON strings.Changes
ExactMatchEvaluatorandLevenshteinDistanceEvaluatorto defaultcase_sensitivetoFalse, and adds guardrails to Levenshtein evaluation (5000-char length cap plus early-exit when strings already match).json_diff_countnow treatsint/floatas numerically equivalent (viamath.isclose) while distinguishingboolfromint, with tests updated/added accordingly.Written by Cursor Bugbot for commit 95de673. This will update automatically on new commits. Configure here.