Skip to content

Conversation

@harrykhh
Copy link
Contributor

This PR integrates Surya as an optional layout analysis backend in Docling.

re: #2222

Key Changes

  • Added SuryaLayoutModel in docling_core/layout_model.py to wrap Surya's detection API.
  • Expanded tests and docs with Surya usage examples and benchmarks.

Non-breaking, modular enhancement for improved layout accuracy on complex PDFs.

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 27, 2025

DCO Check Passed

Thanks @harrykhh, all your commits are properly signed off. 🎉

@mergify
Copy link

mergify bot commented Oct 27, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@harrykhh harrykhh marked this pull request as ready for review October 27, 2025 23:45
@dosubot
Copy link

dosubot bot commented Oct 27, 2025

Related Documentation

Checked 3 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@harrykhh harrykhh force-pushed the integrate-with-surya branch from 3ed08c3 to eb4a6c0 Compare October 28, 2025 00:13
Copy link
Contributor

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately Surya is licenses as GPL and cannot legally be included in this software package (without license changes).

The integration of the OCR engine could anyway be done as a stand-alone plugin. See https://docling-project.github.io/docling/concepts/plugins/.

@harrykhh if you are willing to make such a plugin-package we will be glad to list and promote it in the Docling docs.

@harrykhh harrykhh closed this Nov 2, 2025
@harrykhh harrykhh force-pushed the integrate-with-surya branch from 7410fca to 9a6fdf9 Compare November 2, 2025 19:46
@harrykhh harrykhh reopened this Nov 2, 2025
@harrykhh harrykhh force-pushed the integrate-with-surya branch from 171f893 to 23bc9c4 Compare November 2, 2025 20:15
@harrykhh harrykhh requested a review from dolfim-ibm November 2, 2025 20:16
@harrykhh
Copy link
Contributor Author

harrykhh commented Nov 2, 2025

Thanks for the review, @dolfim-ibm I reverted commits and updated the documents only to include the plugin https://pypi.org/project/docling-surya/

Added a link to the PyPI page for docling-surya.

Signed-off-by: Harry Ho <[email protected]>
@harrykhh harrykhh changed the title feat: Added Suryaocr to ocr model list, added example code docs: Added Suryaocr to ocr model list, added example code Nov 2, 2025
@harrykhh harrykhh changed the title docs: Added Suryaocr to ocr model list, added example code docs: Added documentation to use SuryaOCR via plugin docling-surya Nov 2, 2025
@cau-git
Copy link
Contributor

cau-git commented Nov 4, 2025

@harrykhh Great to see the docling-surya plugin. The example you provide looks fine. I would suggest adding somewhere in the comments that it brings a GPL license. Apart from that it would be fine to merge.

@codecov
Copy link

codecov bot commented Nov 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Added important licensing note regarding SuryaOCR integration. 

Signed-off-by: Harry Ho <[email protected]>
@harrykhh
Copy link
Contributor Author

harrykhh commented Nov 5, 2025

I would suggest adding somewhere in the comments that it brings a GPL license. Apart from that it would be fine to merge.

Done. Thanks @cau-git

@harrykhh harrykhh force-pushed the integrate-with-surya branch from 0eb8e59 to 1c04b39 Compare November 17, 2025 11:07
@dolfim-ibm dolfim-ibm merged commit b216ad8 into docling-project:main Nov 19, 2025
46 of 47 checks passed
@harrykhh harrykhh deleted the integrate-with-surya branch November 20, 2025 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants