Do you need to file a feature request?
Feature Request Description
Summary
- Replace the current
DoclingParser implementation that shells out to the
docling CLI via subprocess.run with a direct integration through the
Docling Python API
(docling.document_converter.DocumentConverter).
This eliminates process-spawning overhead, avoids disk I/O round-trips for
intermediate JSON/Markdown files, and enables in-memory model reuse across
consecutive parse calls — yielding significant performance gains for
multi-document workloads while preserving full backward compatibility.
Motivation / Problem
The current DoclingParser in HKUDS/RAG-Anything invokes Docling through
its command-line interface:
# Current approach (raganything/parser.py – _run_docling_command)
cmd = [
"docling",
"--output", str(file_output_dir),
"--to", "json",
"--to", "md",
str(input_path),
]
result = subprocess.run(cmd, **docling_subprocess_kwargs)
- After the subprocess completes, output files are read back from disk
(_read_output_files). This pattern has several drawbacks:
| Issue | Impact |
|-------|--------|
| Process-spawn overhead | Each parse_* call forks a new process, loads the Python interpreter, and re-initializes all Docling models from scratch. |
| Disk I/O round-trip | Docling writes JSON + Markdown to disk; the parser then reads them back. This is unnecessary when the data is immediately consumed in-memory. |
| No model reuse | Docling's deep-learning models (table structure, OCR, layout) are loaded fresh on every invocation — the most expensive part of the pipeline. |
| Fragile error handling | Errors surface as subprocess.CalledProcessError with stderr strings rather than typed Python exceptions with full stack traces. |
| Platform quirks | Windows requires CREATE_NO_WINDOW flags; PATH must include the docling entry-point. These are unnecessary when calling Python directly. |
| Limited pipeline control | CLI flags expose only a subset of Docling's configuration surface. The Python API offers fine-grained control over pipeline options, format options, and OCR settings. |
Proposed Solution
Core Idea
Use docling.document_converter.DocumentConverter directly in Python:
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode
# Build pipeline options from user kwargs
pipeline_options = PdfPipelineOptions()
pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE
pipeline_options.do_ocr = True
# Create converter (reused across calls)
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options),
}
)
# Parse document — no subprocess, no disk I/O
result = converter.convert(str(input_path))
doc_dict = result.document.export_to_dict()
# Convert to MinerU-compatible content-list format
content_list = self.read_from_block_recursive(
doc_dict["body"], "body", img_output_dir, 0, "0", doc_dict
)
---
Backward Compatibility
- No breaking changes to the public
parse_pdf(), parse_document(),
parse_office_doc(), parse_html() signatures.
- Callers passing
env={"KEY": "VAL"} (previously used for subprocess
environment) will have the type validated but the value silently ignored.
Output Format
- The content-list output format is identical to the current
implementation. The same read_from_block_recursive() /
read_from_block() methods are used to transform the Docling document
dict into the MinerU-compatible structure.
Dependencies
- No new required dependencies.
docling remains an optional package.
- The
check_installation() method gracefully reports whether the package
is available.
pip install docling is the only setup step for users who want to use
the Docling parser backend.
Migration Guide
For End Users
No action required. The public API is unchanged:
from raganything.parser import DoclingParser
parser = DoclingParser()
content = parser.parse_document("report.pdf", output_dir="./output")
For Callers Passing env
The env kwarg is still accepted but no longer has any effect:
# Before (CLI subprocess):
parser.parse_pdf("doc.pdf", env={"DOCLING_CACHE": "/tmp"})
# After (Python API): accepted without error, but env is ignored.
For Advanced Configuration
The Python API exposes more options than the CLI:
parser.parse_pdf(
"doc.pdf",
table_mode="accurate", # TableFormerMode.ACCURATE
tables=True, # do_table_structure = True
allow_ocr=True, # do_ocr = True
artifacts_path="/models", # custom model artifacts directory
)
Checklist
References
Additional Context
No response
Do you need to file a feature request?
Feature Request Description
Summary
DoclingParserimplementation that shells out to thedoclingCLI viasubprocess.runwith a direct integration through theDocling Python API
(
docling.document_converter.DocumentConverter).This eliminates process-spawning overhead, avoids disk I/O round-trips for
intermediate JSON/Markdown files, and enables in-memory model reuse across
consecutive parse calls — yielding significant performance gains for
multi-document workloads while preserving full backward compatibility.
Motivation / Problem
The current
DoclingParserin HKUDS/RAG-Anything invokes Docling throughits command-line interface:
(
_read_output_files). This pattern has several drawbacks:| Issue | Impact |
|-------|--------|
| Process-spawn overhead | Each
parse_*call forks a new process, loads the Python interpreter, and re-initializes all Docling models from scratch. || Disk I/O round-trip | Docling writes JSON + Markdown to disk; the parser then reads them back. This is unnecessary when the data is immediately consumed in-memory. |
| No model reuse | Docling's deep-learning models (table structure, OCR, layout) are loaded fresh on every invocation — the most expensive part of the pipeline. |
| Fragile error handling | Errors surface as
subprocess.CalledProcessErrorwith stderr strings rather than typed Python exceptions with full stack traces. || Platform quirks | Windows requires
CREATE_NO_WINDOWflags; PATH must include thedoclingentry-point. These are unnecessary when calling Python directly. || Limited pipeline control | CLI flags expose only a subset of Docling's configuration surface. The Python API offers fine-grained control over pipeline options, format options, and OCR settings. |
Proposed Solution
Core Idea
Use
docling.document_converter.DocumentConverterdirectly in Python:---
Backward Compatibility
parse_pdf(),parse_document(),parse_office_doc(),parse_html()signatures.env={"KEY": "VAL"}(previously used for subprocessenvironment) will have the type validated but the value silently ignored.
Output Format
implementation. The same
read_from_block_recursive()/read_from_block()methods are used to transform the Docling documentdict into the MinerU-compatible structure.
Dependencies
doclingremains an optional package.check_installation()method gracefully reports whether the packageis available.
pip install doclingis the only setup step for users who want to usethe Docling parser backend.
Migration Guide
For End Users
No action required. The public API is unchanged:
For Callers Passing
envThe
envkwarg is still accepted but no longer has any effect:For Advanced Configuration
The Python API exposes more options than the CLI:
Checklist
References
DocumentConverterusage: https://ds4sd.github.io/docling/Additional Context
No response