fix(core): reject PDF files to prevent session corruption (fixes #2020) by drewd789 · Pull Request #2024 · QwenLM/qwen-code

drewd789 · 2026-02-28T18:25:47Z

TLDR

Fixes PDF session corruption by rejecting PDF files at the source instead of embedding them as base64 inlineData (which caused API 400 errors). Users should extract PDF text externally before providing it to the tool.

Dive Deeper

PDF files were causing API 400 errors ("Invalid value: file. Supported values: 'text','image_url','video_url' and 'video'") and corrupting session files with base64 data, requiring manual cleanup with a Python script.

This fix rejects PDFs in processSingleFileContent() with a clear error message directing users to extract text externally.

⚠️ Breaking Change: This modifies the behavior of the read_file tool. Workflows that previously attempted to read PDF files directly will now receive an error instead of embedded base64 data.

Before vs After

Scenario	Before	After
Read PDF via `read_file`	Embeds base64 → API 400 error → session corruption	Returns clear error message, no corruption
Read image via `read_file`	Works correctly	Works correctly (unchanged)
Read text file via `read_file`	Works correctly	Works correctly (unchanged)

Migration Guide

Users who previously relied on PDF embedding should:

Extract text from PDFs using external tools (pdftotext, pdfplumber, online converters)
Provide the extracted text directly to the conversation
Or use a dedicated PDF processing tool if available

Reviewer Test Plan

Primary Test (PDF Rejection)

Run: /read_file path/to/test.pdf (or use the read_file tool via chat)
Verify error message contains:
- Clear statement that PDFs are not supported
- Guidance to extract text externally
Check session file - verify no base64 data was written
Verify the conversation continues normally without corruption

Regression Test (Images Still Work)

Run: /read_file path/to/test.png
Verify image is processed correctly

Edge Cases

Test PDF with uppercase extension (.PDF)
Test other binary formats (.docx, .xlsx) - should also be rejected
Test text files (.txt, .md) - should still work normally

Testing Matrix

	🍏	🪟	🐧
npm run	❓	❓	❓
npx	❓	❓	❓
Docker	❓	❓	❓
Podman	❓	-	-
Seatbelt	❓	-	-

Files Changed

packages/core/src/utils/fileUtils.ts - Main fix in processSingleFileContent()
packages/core/src/utils/fileUtils.test.ts - Test updates
docs/developers/tools/file-system.md - Documentation updates

Linked issues / bugs

Fixes #2020

…LM#2020)

tanzhenxin · 2026-03-01T02:43:30Z

@drewd789 Thank you for this well-thought-out contribution! The detailed test plan and clear documentation really helped us understand the issue.

After comparing with #1982, we're going with that approach because it solves the same problem more comprehensively: instead of blocking PDFs entirely, it assigns appropriate modalities per model. This means we can support it once a model can read PDFs natively.

Thanks again for taking the time to dig into this and write such a thorough PR!

drewd789 · 2026-03-01T02:55:02Z

@tanzhenxin Thanks for the review!

One clarification: PR #1982 prevents API 400 errors, but doesn't prevent session corruption. The base64 inlineData still embeds in the session file, flooding context and requiring manual cleanup.

Our fix prevents the corruption at the source. Could PR #1982 be extended to also block session embedding for models without PDF support? That would give both benefits.

Happy to discuss!

tanzhenxin · 2026-03-03T15:03:32Z

@drewd789 Well, I think you made a valid point. We can optimize it at where the reading actually happens. But I think we should be consistent with all modalities, check and reject the binary with fallback text content if the model cannot support it.
Would you like to continue this work on top of the modality support of #1982 ?

drewd789 · 2026-03-04T09:47:39Z

@tanzhenxin Yes, updated PR to integrate with the modalityDefaults system from #1982.

Changes:

File type checking now happens at the source in processSingleFileContent() using defaultModalities()
All media types (image, pdf, audio, video) are handled consistently through the same modality check
Files are rejected before base64 is created, preventing session file bloat
Clear error messages direct users to use a model that supports the file type or convert externally

Test coverage:

Rejects images for text-only models (e.g., deepseek)
Rejects PDFs for text-only models (e.g., qwen3-coder-plus)
Accepts PDFs for models that support them (e.g., claude-3-sonnet)

Does this look right to you, or would you like me to adjust anything?

fix(core): reject PDF files to prevent session corruption (fixes Qwen…

de20bb1

…LM#2020)

drewd789 requested review from DennisYu07, LaZzyMan, Mingholy, gwinthis, pomelo-nwu and tanzhenxin as code owners February 28, 2026 18:25

This was referenced Mar 1, 2026

📊 AI CLI 工具社区动态日报 2026-03-01 duanyytop/agents-radar#31

Open

📊 AI CLI 工具社区动态日报 2026-03-01 rollysys/agents-radar#7

Open

tanzhenxin self-assigned this Mar 1, 2026

This was referenced Mar 4, 2026

📊 AI CLI 工具社区动态日报 2026-03-04 duanyytop/agents-radar#70

Open

📊 AI CLI 工具社区动态日报 2026-03-04 rollysys/agents-radar#32

Open

drewd789 added 3 commits March 3, 2026 20:30

Merge main to get modalityDefaults support

e2ca264

feat: check model modalities before processing media files

d98ffd0

style: format streamingToolCallParser.test.ts (prettier)

0cede7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): reject PDF files to prevent session corruption (fixes #2020)#2024

fix(core): reject PDF files to prevent session corruption (fixes #2020)#2024
drewd789 wants to merge 4 commits intoQwenLM:mainfrom
drewd789:fix/pdf-session-corruption

drewd789 commented Feb 28, 2026 •

edited

Loading

Uh oh!

tanzhenxin commented Mar 1, 2026

Uh oh!

drewd789 commented Mar 1, 2026 •

edited

Loading

Uh oh!

tanzhenxin commented Mar 3, 2026

Uh oh!

drewd789 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewd789 commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TLDR

Dive Deeper

Before vs After

Migration Guide

Reviewer Test Plan

Primary Test (PDF Rejection)

Regression Test (Images Still Work)

Edge Cases

Testing Matrix

Files Changed

Linked issues / bugs

Uh oh!

tanzhenxin commented Mar 1, 2026

Uh oh!

drewd789 commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tanzhenxin commented Mar 3, 2026

Uh oh!

drewd789 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drewd789 commented Feb 28, 2026 •

edited

Loading

drewd789 commented Mar 1, 2026 •

edited

Loading