Skip to content

fix: use json.loads(strict=False) for all LLM output parsing paths#1580

Merged
RealKai42 merged 3 commits intomainfrom
kaiyi/fix-json-strict-parse
Mar 25, 2026
Merged

fix: use json.loads(strict=False) for all LLM output parsing paths#1580
RealKai42 merged 3 commits intomainfrom
kaiyi/fix-json-strict-parse

Conversation

@RealKai42
Copy link
Copy Markdown
Collaborator

@RealKai42 RealKai42 commented Mar 25, 2026

Description:

When LLMs generate tool call arguments containing unescaped control characters (e.g. literal newlines in multi-line shell commands), json.loads() raises JSONDecodeError in strict mode. This causes tool execution failure, and worse, the malformed assistant message gets persisted to session history, making the session permanently irrecoverable.

Apply strict=False to all 14 json.loads call sites that parse LLM-generated content:

  • Tool execution: toolset.py, simple.py — prevents initial parse failure
  • API conversion: anthropic.py, google_genai.py — fixes session irrecoverability on restore
  • Session restore: context.py, linear.py, session.py — defensive tolerance for poisoned JSONL
  • UI/export: debug.py, visualize.py, tools/__init__.py, export.py — prevents display/export crashes

Fixes #1378

Checklist

  • I have read the CONTRIBUTING document.
  • I have linked the related issue, if any.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have run make gen-changelog to update the changelog.
  • I have run make gen-docs to update the user documentation.

Open with Devin

Copilot AI review requested due to automatic review settings March 25, 2026 09:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes LLM-generated JSON parsing more tolerant by using json.loads(..., strict=False) across multiple tool-call and session/history parsing paths, preventing failures caused by unescaped control characters in model output.

Changes:

  • Use json.loads(strict=False) when parsing tool call arguments in core tool execution and provider conversion.
  • Apply the same relaxed parsing to session restore / JSONL consumption and UI/export display paths.
  • Update changelogs/release notes to document the fix.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/kimi_cli/utils/export.py Relax JSON parsing for tool-call argument formatting/export rendering
src/kimi_cli/ui/shell/visualize.py Relax parsing when extracting URLs from tool arguments
src/kimi_cli/ui/shell/debug.py Relax parsing when pretty-printing tool call arguments in debug UI
src/kimi_cli/tools/init.py Relax parsing for extracting key arguments from tool-call JSON
src/kimi_cli/soul/toolset.py Relax parsing of tool-call arguments during tool execution
src/kimi_cli/soul/context.py Relax parsing of JSONL session/context restore lines
src/kimi_cli/session.py Relax parsing when scanning session JSONL for “emptiness”
packages/kosong/src/kosong/tooling/simple.py Relax parsing of tool-call arguments during tool execution (kosong)
packages/kosong/src/kosong/contrib/context/linear.py Relax parsing of Linear context JSONL restore lines
packages/kosong/src/kosong/contrib/chat_provider/google_genai.py Relax parsing when converting tool arguments to Google GenAI payloads
packages/kosong/src/kosong/contrib/chat_provider/anthropic.py Relax parsing when converting tool arguments to Anthropic payloads
packages/kosong/CHANGELOG.md Document the relaxed JSON parsing change
docs/zh/release-notes/changelog.md Document the relaxed JSON parsing change (ZH)
docs/en/release-notes/changelog.md Document the relaxed JSON parsing change (EN)
CHANGELOG.md Document the relaxed JSON parsing change (root changelog)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if not line:
continue
role = json.loads(line).get("role")
role = json.loads(line, strict=False).get("role")
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_empty() still raises json.JSONDecodeError if any JSONL line is malformed in ways unrelated to strict (e.g., truncated/corrupted lines). Since this method is likely used in startup/housekeeping paths, a single bad line can crash the CLI. Consider catching json.JSONDecodeError (and optionally TypeError) inside the loop and skipping bad lines (or conservatively treating the session as non-empty) to make session handling more resilient.

Suggested change
role = json.loads(line, strict=False).get("role")
try:
record = json.loads(line, strict=False)
except (json.JSONDecodeError, ValueError, TypeError):
logger.warning(
"Skipping malformed JSON line in context file {file}:",
file=self.context_file,
)
continue
role = record.get("role")

Copilot uses AI. Check for mistakes.
args_formatted = json.dumps(json.loads(args_raw), indent=2, ensure_ascii=False)
parsed = json.loads(args_raw, strict=False)
args_formatted = json.dumps(parsed, indent=2, ensure_ascii=False)
except json.JSONDecodeError:
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args_raw comes from tc.function.arguments and is not guaranteed to be a str at runtime (elsewhere in this file you already defensively catch TypeError). If it’s a non-string truthy value, json.loads(args_raw, ...) will raise TypeError and bypass the current except, crashing export rendering. Consider expanding the handler to except (json.JSONDecodeError, TypeError) for consistency with _extract_tool_call_hint() and _stringify_tool_calls().

Suggested change
except json.JSONDecodeError:
except (json.JSONDecodeError, TypeError):

Copilot uses AI. Check for mistakes.
Comment on lines 75 to 76
except json.JSONDecodeError:
args_syntax = Text(args, style="red")
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to other call sites, if tool_call.function.arguments is ever a non-string truthy value, json.loads(args, ...) can raise TypeError, which is not caught here and would break the debug UI. Consider catching (json.JSONDecodeError, TypeError) to keep the debug view robust when encountering unexpected argument types.

Suggested change
except json.JSONDecodeError:
args_syntax = Text(args, style="red")
except (json.JSONDecodeError, TypeError):
args_syntax = Text(str(args), style="red")

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 6 additional findings in Devin Review.

Open in Devin Review


## Unreleased

- Core: Fix JSON parsing error when LLM tool call arguments contain unescaped control characters — use `json.loads(strict=False)` across all LLM output parsing paths to prevent tool execution failure and session corruption
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Manual edit to auto-generated English changelog violates docs/AGENTS.md rule

The docs/AGENTS.md rule states: "The English changelog (docs/en/release-notes/changelog.md) is auto-generated from the root CHANGELOG.md. Do not edit it manually." This PR directly adds a new entry to docs/en/release-notes/changelog.md, which should instead be generated by running npm run sync from the docs/ directory after editing the root CHANGELOG.md. The content is identical to what the sync script would produce, so there is no functional impact, but it violates the stated workflow.

Prompt for agents
Remove the manually added line from docs/en/release-notes/changelog.md and instead run `npm run sync` from the docs/ directory to auto-generate the English changelog from the root CHANGELOG.md. The sync script (docs/scripts/sync-changelog.mjs) will copy the entry that was already added to the root CHANGELOG.md.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@RealKai42 RealKai42 merged commit 109c1e5 into main Mar 25, 2026
19 checks passed
@RealKai42 RealKai42 deleted the kaiyi/fix-json-strict-parse branch March 25, 2026 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JSON parsing error when tool call arguments contain control characters

2 participants