Skip to content

fix(core): reject PDF files to prevent session corruption (fixes #2020)#2024

Open
drewd789 wants to merge 4 commits intoQwenLM:mainfrom
drewd789:fix/pdf-session-corruption
Open

fix(core): reject PDF files to prevent session corruption (fixes #2020)#2024
drewd789 wants to merge 4 commits intoQwenLM:mainfrom
drewd789:fix/pdf-session-corruption

Conversation

@drewd789
Copy link

@drewd789 drewd789 commented Feb 28, 2026

TLDR

Fixes PDF session corruption by rejecting PDF files at the source instead of embedding them as base64 inlineData (which caused API 400 errors). Users should extract PDF text externally before providing it to the tool.

Dive Deeper

PDF files were causing API 400 errors ("Invalid value: file. Supported values: 'text','image_url','video_url' and 'video'") and corrupting session files with base64 data, requiring manual cleanup with a Python script.

This fix rejects PDFs in processSingleFileContent() with a clear error message directing users to extract text externally.

⚠️ Breaking Change: This modifies the behavior of the read_file tool. Workflows that previously attempted to read PDF files directly will now receive an error instead of embedded base64 data.

Before vs After

Scenario Before After
Read PDF via read_file Embeds base64 → API 400 error → session corruption Returns clear error message, no corruption
Read image via read_file Works correctly Works correctly (unchanged)
Read text file via read_file Works correctly Works correctly (unchanged)

Migration Guide

Users who previously relied on PDF embedding should:

  1. Extract text from PDFs using external tools (pdftotext, pdfplumber, online converters)
  2. Provide the extracted text directly to the conversation
  3. Or use a dedicated PDF processing tool if available

Reviewer Test Plan

Primary Test (PDF Rejection)

  1. Run: /read_file path/to/test.pdf (or use the read_file tool via chat)
  2. Verify error message contains:
    • Clear statement that PDFs are not supported
    • Guidance to extract text externally
  3. Check session file - verify no base64 data was written
  4. Verify the conversation continues normally without corruption

Regression Test (Images Still Work)

  1. Run: /read_file path/to/test.png
  2. Verify image is processed correctly

Edge Cases

  1. Test PDF with uppercase extension (.PDF)
  2. Test other binary formats (.docx, .xlsx) - should also be rejected
  3. Test text files (.txt, .md) - should still work normally

Testing Matrix

🍏 🪟 🐧
npm run
npx
Docker
Podman - -
Seatbelt - -

Files Changed

  • packages/core/src/utils/fileUtils.ts - Main fix in processSingleFileContent()
  • packages/core/src/utils/fileUtils.test.ts - Test updates
  • docs/developers/tools/file-system.md - Documentation updates

Linked issues / bugs

Fixes #2020

@tanzhenxin
Copy link
Collaborator

@drewd789 Thank you for this well-thought-out contribution! The detailed test plan and clear documentation really helped us understand the issue.

After comparing with #1982, we're going with that approach because it solves the same problem more comprehensively: instead of blocking PDFs entirely, it assigns appropriate modalities per model. This means we can support it once a model can read PDFs natively.

Thanks again for taking the time to dig into this and write such a thorough PR!

@tanzhenxin tanzhenxin self-assigned this Mar 1, 2026
@drewd789
Copy link
Author

drewd789 commented Mar 1, 2026

@tanzhenxin Thanks for the review!

One clarification: PR #1982 prevents API 400 errors, but doesn't prevent session corruption. The base64 inlineData still embeds in the session file, flooding context and requiring manual cleanup.

Our fix prevents the corruption at the source. Could PR #1982 be extended to also block session embedding for models without PDF support? That would give both benefits.

Happy to discuss!

@tanzhenxin
Copy link
Collaborator

@drewd789 Well, I think you made a valid point. We can optimize it at where the reading actually happens. But I think we should be consistent with all modalities, check and reject the binary with fallback text content if the model cannot support it.
Would you like to continue this work on top of the modality support of #1982 ?

@drewd789
Copy link
Author

drewd789 commented Mar 4, 2026

@tanzhenxin Yes, updated PR to integrate with the modalityDefaults system from #1982.

Changes:

  • File type checking now happens at the source in processSingleFileContent() using defaultModalities()
  • All media types (image, pdf, audio, video) are handled consistently through the same modality check
  • Files are rejected before base64 is created, preventing session file bloat
  • Clear error messages direct users to use a model that supports the file type or convert externally

Test coverage:

  • Rejects images for text-only models (e.g., deepseek)
  • Rejects PDFs for text-only models (e.g., qwen3-coder-plus)
  • Accepts PDFs for models that support them (e.g., claude-3-sonnet)

Does this look right to you, or would you like me to adjust anything?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Failed to read PDF file and then became unavailable w/ red error message "API Error: 400"

2 participants