Skip to content

[Bug]: Files with the same name are stored in the same output location #51

@jesse-merhi

Description

@jesse-merhi

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

I have noticed that if you process two files with the same name, for example file.pdf, even if they are in different directories, RAG anything will overwrite the output from the processing of the first file with the second file's output.

I believe this can be fixed if the output saved the entire filepath of a file (or even its hash), but neither of these seem to be the case.

Steps to reproduce

Create two files with the same name, but in different subdirectories and index the directory containing both.

❯ ls ~/files
subdir paper.pdf
❯ ls ~/files/subdir
paper.pdf

Expected Behavior

We should separate the indexes for these files, which avoids overwriting.

LightRAG Config Used

Paste your config here

lr = LightRAG(
        working_dir=SHARED_WORKDIR,
        llm_model_func=_llm(_api_key),
        embedding_func=_embed(_api_key),
    )
    await lr.initialize_storages()
    await initialize_pipeline_status()
    logging.info(f"Creating new rag on {SHARED_WORKDIR}")
    _global_rag = RAGAnything(
        lightrag=lr,
        llm_model_func=lr.llm_model_func,
        vision_model_func=_vision(_api_key),
        embedding_func=lr.embedding_func,
        config=RAGAnythingConfig(
            working_dir=SHARED_WORKDIR,
            mineru_parse_method="auto",
            enable_image_processing=True,
            enable_table_processing=True,
            enable_equation_processing=True,
        ),
    )

Logs and screenshots

The output directory looks quite strange. Something interesting is the images directory contains images from both pdfs but the rest of the data is just from one.
Image

Additional Information

  • LightRAG Version: latest
  • Operating System: Ubuntu 24.04.2 LTS
  • Python Version: >=3.12
  • Related Issues: None.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions