Do you need to file an issue?
Describe the bug
When using MinerU's hybrid-auto-engine backend, RAGAnything fails to read the parsed output files because the directory lookup logic doesn't account for hybrid backend naming conventions.
Steps to reproduce
`from raganything import RAGAnything, RAGAnythingConfig
rag = RAGAnything(
config=RAGAnythingConfig(parser="mineru", parse_method="auto"),
# ... other params
)
This fails to read the output files
content_list, doc_id = await rag.parse_document(
file_path="document.pdf",
output_dir="./output",
parse_method="auto",
backend="hybrid-auto-engine", # MinerU 2.7+ default
)`
Expected Behavior
RAGAnything should correctly locate and read content_list.json from the hybrid_auto/ directory.
Actual Behavior
RAGAnything looks for files in auto/ directory, but MinerU creates them in hybrid_auto/ when using hybrid-auto-engine backend.
Root Cause
In parser.py, the parse_pdf() method only handles VLM backend directory mapping:
LightRAG Config Used
Paste your config here
Logs and screenshots
MinerU directory naming convention:
Backend | CLI params | Output directory
-- | -- | --
pipeline | -m auto -b pipeline | {file_stem}/auto/
hybrid-auto-engine | -m auto -b hybrid-auto-engine | {file_stem}/hybrid_auto/
vlm-auto-engine | -m auto -b vlm-auto-engine | {file_stem}/vlm/
Suggested Fix
Add hybrid backend handling in parser.py
MinerU directory naming convention:
Backend CLI params Output directory
pipeline -m auto -b pipeline {file_stem}/auto/
hybrid-auto-engine -m auto -b hybrid-auto-engine {file_stem}/hybrid_auto/
vlm-auto-engine -m auto -b vlm-auto-engine {file_stem}/vlm/
Suggested Fix
Add hybrid backend handling in parser.py
Additional Information
- LightRAG Version:1.4.9.8
- Operating System: MACOS 26.2
- Python Version: 3.12
- Related Issues:
Do you need to file an issue?
Describe the bug
When using MinerU's hybrid-auto-engine backend, RAGAnything fails to read the parsed output files because the directory lookup logic doesn't account for hybrid backend naming conventions.
Steps to reproduce
`from raganything import RAGAnything, RAGAnythingConfig
rag = RAGAnything(
config=RAGAnythingConfig(parser="mineru", parse_method="auto"),
# ... other params
)
This fails to read the output files
content_list, doc_id = await rag.parse_document(
file_path="document.pdf",
output_dir="./output",
parse_method="auto",
backend="hybrid-auto-engine", # MinerU 2.7+ default
)`
Expected Behavior
RAGAnything should correctly locate and read content_list.json from the hybrid_auto/ directory.
Actual Behavior
RAGAnything looks for files in auto/ directory, but MinerU creates them in hybrid_auto/ when using hybrid-auto-engine backend.
Root Cause
In parser.py, the parse_pdf() method only handles VLM backend directory mapping:
LightRAG Config Used
Paste your config here
Logs and screenshots
MinerU directory naming convention:
Backend | CLI params | Output directory -- | -- | -- pipeline | -m auto -b pipeline | {file_stem}/auto/ hybrid-auto-engine | -m auto -b hybrid-auto-engine | {file_stem}/hybrid_auto/ vlm-auto-engine | -m auto -b vlm-auto-engine | {file_stem}/vlm/Suggested Fix
Add hybrid backend handling in parser.py
MinerU directory naming convention:Backend CLI params Output directory
pipeline -m auto -b pipeline {file_stem}/auto/
hybrid-auto-engine -m auto -b hybrid-auto-engine {file_stem}/hybrid_auto/
vlm-auto-engine -m auto -b vlm-auto-engine {file_stem}/vlm/
Suggested Fix
Add hybrid backend handling in parser.py
Additional Information