fix(core): coerce stringified JSON values for anyOf/oneOf MCP tool schemas#2858
fix(core): coerce stringified JSON values for anyOf/oneOf MCP tool schemas#2858tanzhenxin merged 4 commits intomainfrom
Conversation
) Some LLMs serialize array/object values as JSON strings when the schema uses anyOf/oneOf with mixed types (e.g., Python's `list[str] | None`). The model returns '["url"]' (a string) instead of ["url"] (an array), causing Ajv to reject otherwise valid input. Add fixStringifiedJsonValues() coercion — same pattern as the existing fixBooleanValues() — that parses stringified arrays/objects back to their intended type when the schema expects a non-string type.
📋 Review SummaryThis PR addresses a well-investigated issue where some LLMs serialize array/object values as JSON strings when encountering 🔍 General Feedback
🎯 Specific Feedback🔵 Low
✅ Highlights
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
| const variants = propSchema[keyword]; | ||
| if (Array.isArray(variants)) { | ||
| for (const variant of variants as Array<Record<string, unknown>>) { | ||
| if (typeof variant['type'] === 'string') { |
There was a problem hiding this comment.
[Suggestion] getAcceptedTypes doesn't handle type as an array inside anyOf/oneOf variants
The function correctly handles type: [...] (array form) at the property-schema level (lines 139–142), but inside anyOf/oneOf variants it only checks typeof variant['type'] === 'string' (line 152), silently skipping when type is an array. For a schema like { "anyOf": [{ "type": ["array", "null"] }, { "type": "integer" }] }, only "integer" is detected and "array" is missed, causing the coercion guard to skip the property entirely.
Suggested fix:
for (const variant of variants as Array<Record<string, unknown>>) {
if (typeof variant['type'] === 'string') {
types.add(variant['type'] as string);
} else if (Array.isArray(variant['type'])) {
for (const t of variant['type'] as string[]) {
types.add(t);
}
}
}
wenshao
left a comment
There was a problem hiding this comment.
Review Summary
PR 2858 adds fixStringifiedJsonValues() coercion in SchemaValidator to handle LLMs that serialize array/object values as JSON strings when encountering anyOf/oneOf union types in MCP tool schemas.
Verification stats: 5 findings reported by agents, 0 confirmed after verification. 1 additional finding identified by reverse audit (1 confirmed, high confidence).
Deterministic analysis: TypeScript type checking and ESLint clean on changed files. Build and all 30 tests pass.
1 suggestion — getAcceptedTypes doesn't handle type as an array inside anyOf/oneOf variants (only checks typeof variant['type'] === 'string', missing the Array.isArray(variant['type']) case). This means schemas using type: ["array", "null"] inside an anyOf variant won't be detected, causing the coercion to skip the property.
Verdict: Comment — no critical issues, one valid suggestion.
Reviewed by glm-5.1 via Qwen Code /review
…tedTypes The function only checked for string-form type inside anyOf/oneOf variants, missing the array form (e.g., type: ["array", "null"]). This mirrors the handling already done at the top-level property schema.
The test has a stale closure race condition: the 50ms wait between pressing '2' and Enter may not be enough for React/Ink to re-render and re-subscribe the useKeypress callback with the updated selectedIndex, causing it to read the default value (0) instead of the expected value (1) on slow CI runners (Windows + Node 20).
…e' test Same class of Windows CI timing flake — the backspace keypress doesn't propagate through the paste/keypress pipeline fast enough on slow runners, so replaceRangeByOffset is never called (0 calls).
| const value = data[key]; | ||
| const propSchema = properties[key]; | ||
| if (!propSchema || typeof value !== 'string') continue; | ||
|
|
There was a problem hiding this comment.
[Suggestion] fixStringifiedJsonValues does not recurse into nested objects, unlike fixBooleanValues
fixBooleanValues recursively descends into nested objects, but fixStringifiedJsonValues only processes top-level keys. Nested stringified JSON values (e.g. { config: { urls: '["a","b"]' } }) will not be coerced.
Suggested fix:
} else if (typeof value === 'object' && value !== null && !Array.isArray(value)) {
fixStringifiedJsonValues(value as Record<string, unknown>, propSchema);
continue;
}— qwen3.6-plus via Qwen Code /review
| */ | ||
| /** | ||
| * Returns the set of JSON Schema types that a property accepts, | ||
| * considering `type`, `anyOf`, and `oneOf` keywords. |
There was a problem hiding this comment.
[Suggestion] Misplaced JSDoc — the fixBooleanValues description (from pre-existing code, lines 121-130) now appears to document getAcceptedTypes instead.
The PR inserted getAcceptedTypes between the old fixBooleanValues JSDoc and the actual function (line ~217). IDE hover will show the wrong description for this function.
Suggested fix: Move the JSDoc block from lines 121-130 to immediately precede fixBooleanValues.
— qwen3.6-plus via Qwen Code /review
| if (typeof variant['type'] === 'string') { | ||
| types.add(variant['type'] as string); | ||
| } else if (Array.isArray(variant['type'])) { | ||
| for (const t of variant['type'] as string[]) { |
There was a problem hiding this comment.
[Suggestion] getAcceptedTypes ignores allOf schemas — it only inspects type, anyOf, and oneOf. If a property uses allOf (e.g. allOf: [{type: 'array'}, {...constraints}]), the function returns null and coercion is skipped.
Suggested fix: Add an allOf branch similar to anyOf/oneOf.
— qwen3.6-plus via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
The fix approach is sound and well-tested for the common case (30/30 tests pass). All findings are edge-case gaps rather than bugs in the core logic. See inline comments for details.
Reviewed by qwen3.6-plus via Qwen Code /review
TLDR
Fix MCP tool validation failures when schemas use
anyOf/oneOfunion types (e.g., Python'slist[str] | None). Some LLMs serialize array/object values as JSON strings when encountering these schemas — this PR adds client-side coercion to parse them back to their intended types before validation.Fixes #2839
Screenshots / Video Demo
Before — tool call rejected with contradictory error:
After — stringified array is coerced and tool call succeeds:
Dive Deeper
Root cause
When an MCP tool parameter uses
anyOf(e.g.,anyOf: [{type: "array"}, {type: "null"}]— generated by Python'slist[str] | None), some LLMs generate the value as a JSON string instead of the actual JSON type:anyOf: [array, null]"urls": "[\"https://example.com\"]"(string)"urls": ["https://example.com"](array){type: "array"}(no anyOf)"urls": ["https://example.com"](correct)The Ajv schema validator is working correctly — it rightfully rejects a string value against
anyOf: [array, null]. The issue is in what the model produces, not in the validator.Fix approach
Add
fixStringifiedJsonValues()coercion inSchemaValidator, following the same pattern as the existingfixBooleanValues()(which handles LLMs returning"true"/"false"strings instead of booleans). The coercion:[...) or objects ({...)array/objectbut notstring— avoids false coercion when string is a valid typeInvestigation details
Full structured debugging investigation documented in the commit. Key evidence from API request/response logging:
anyOf— pipeline is cleanfunction.argumentsfrom model response:'{"urls":"[\"https://example.com\"]"}'— model is the source of the stringificationReviewer Test Plan
cd packages/core && npx vitest run src/utils/schemaValidator.test.tsanyOf,oneOf, objects, safety guardsurls: anyOf [array, null]parameter.qwen/settings.jsonundermcpServersnode dist/cli.js "Call the extract tool with urls=['https://example.com']" --approval-mode yolo --output-format jsonTesting Matrix
Linked issues / bugs
Fixes #2839