Skip to content

[Bug]:MINERU 2.7.0+ cause: parse_document() fails to locate output files when using hybrid-auto-engine backend #186

@hanlianlu

Description

@hanlianlu

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

When using MinerU's hybrid-auto-engine backend, RAGAnything fails to read the parsed output files because the directory lookup logic doesn't account for hybrid backend naming conventions.

Steps to reproduce

`from raganything import RAGAnything, RAGAnythingConfig

rag = RAGAnything(
config=RAGAnythingConfig(parser="mineru", parse_method="auto"),
# ... other params
)

This fails to read the output files

content_list, doc_id = await rag.parse_document(
file_path="document.pdf",
output_dir="./output",
parse_method="auto",
backend="hybrid-auto-engine", # MinerU 2.7+ default
)`

Expected Behavior

RAGAnything should correctly locate and read content_list.json from the hybrid_auto/ directory.

Actual Behavior
RAGAnything looks for files in auto/ directory, but MinerU creates them in hybrid_auto/ when using hybrid-auto-engine backend.

Root Cause
In parser.py, the parse_pdf() method only handles VLM backend directory mapping:

LightRAG Config Used

Paste your config here

Logs and screenshots

MinerU directory naming convention:

Backend | CLI params | Output directory -- | -- | -- pipeline | -m auto -b pipeline | {file_stem}/auto/ hybrid-auto-engine | -m auto -b hybrid-auto-engine | {file_stem}/hybrid_auto/ vlm-auto-engine | -m auto -b vlm-auto-engine | {file_stem}/vlm/

Suggested Fix

Add hybrid backend handling in parser.py

MinerU directory naming convention:

Backend CLI params Output directory
pipeline -m auto -b pipeline {file_stem}/auto/
hybrid-auto-engine -m auto -b hybrid-auto-engine {file_stem}/hybrid_auto/
vlm-auto-engine -m auto -b vlm-auto-engine {file_stem}/vlm/
Suggested Fix
Add hybrid backend handling in parser.py

Additional Information

  • LightRAG Version:1.4.9.8
  • Operating System: MACOS 26.2
  • Python Version: 3.12
  • Related Issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions