Skip to content

support bfrange having incorrect length in a cmap#1089

Merged
BobLd merged 1 commit intomasterfrom
fix-corpus-0001413-file
Jul 19, 2025
Merged

support bfrange having incorrect length in a cmap#1089
BobLd merged 1 commit intomasterfrom
fix-corpus-0001413-file

Conversation

@EliotJones
Copy link
Copy Markdown
Member

the corpus file 0001413.pdf has an off-by-one error in its count for cmap bfranges. here we exit early if an unexpected endbfrange operator is encountered early. this matches the pdfbox behavior:

https://github.com/apache/pdfbox/blob/067d56e4dbf5817158d1e910b9602d031ab2153d/fontbox/src/main/java/org/apache/fontbox/cmap/CMapParser.java#L373

the corpus file 0001413.pdf has an off-by-one error in its count
for cmap bfranges. here we exit early if an unexpected
endbfrange operator is encountered early. this matches the pdfbox
behavior:

https://github.com/apache/pdfbox/blob/067d56e4dbf5817158d1e910b9602d031ab2153d/fontbox/src/main/java/org/apache/fontbox/cmap/CMapParser.java#L373
@EliotJones EliotJones requested a review from BobLd July 18, 2025 01:36
@EliotJones
Copy link
Copy Markdown
Member Author

With this change we support parsing all 1000 documents of corpus set 0001 except the following.

Corrupt:

  • 0001589.pdf
  • 0001957.pdf (can parse with SkipMissingFonts)

@BobLd BobLd merged commit bffd514 into master Jul 19, 2025
2 checks passed
@BobLd BobLd deleted the fix-corpus-0001413-file branch July 19, 2025 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants