Do you need to file an issue?
Describe the bug
During multimodal content ingestion, several merge_nodes_and_edges call sites in processor.py and modalprocessors.py do not pass entity_chunks_storage, relation_chunks_storage (and in one case, full_entities_storage / full_relations_storage) to lightrag.operate.merge_nodes_and_edges.
Since these parameters default to None, the calls succeed silently — but entity-to-chunk and relation-to-chunk mappings are never persisted. This means multimodal entities are created in the knowledge graph and vector DB, but their chunk association metadata is lost, degrading retrieval quality for multimodal content.
Affected Files (v1.2.10)
| File |
Method |
Missing Parameters |
| processor.py |
_process_multimodal_content_individual (line ~756) |
entity_chunks_storage, relation_chunks_storage |
| processor.py |
_batch_merge_lightrag_style_type_aware (line ~1360) |
entity_chunks_storage, relation_chunks_storage |
| modalprocessors.py |
BaseModalProcessor._process_chunk_for_extraction (line ~780) |
full_entities_storage, full_relations_storage, entity_chunks_storage, relation_chunks_storage |
Steps to reproduce
- Ingest a PDF containing images/tables via
process_document_complete()
- Inspect
kv_store_entity_chunks.json and kv_store_relation_chunks.json — multimodal entity/chunk mappings are absent
- Compare with text-only ingestion where mappings are correctly populated by LightRAG's internal pipeline
Expected Behavior
All merge_nodes_and_edges calls should pass the full set of storage instances, consistent with how lightrag itself calls the function internally during text ingestion:
await merge_nodes_and_edges(
chunk_results=...,
knowledge_graph_inst=self.lightrag.chunk_entity_relation_graph,
entity_vdb=self.lightrag.entities_vdb,
relationships_vdb=self.lightrag.relationships_vdb,
global_config=self.lightrag.__dict__,
full_entities_storage=self.lightrag.full_entities, # missing in modalprocessors.py
full_relations_storage=self.lightrag.full_relations, # missing in modalprocessors.py
pipeline_status=pipeline_status,
pipeline_status_lock=pipeline_status_lock,
llm_response_cache=self.lightrag.llm_response_cache,
entity_chunks_storage=self.lightrag.entity_chunks, # missing in all 3 call sites
relation_chunks_storage=self.lightrag.relation_chunks, # missing in all 3 call sites
current_file_number=1,
total_files=1,
file_path=file_path,
)
Suggested Fix
processor.py — add two lines to both _process_multimodal_content_individual and _batch_merge_lightrag_style_type_aware:
llm_response_cache=self.lightrag.llm_response_cache,
+ entity_chunks_storage=self.lightrag.entity_chunks,
+ relation_chunks_storage=self.lightrag.relation_chunks,
current_file_number=1,
modalprocessors.py — add four lines to BaseModalProcessor._process_chunk_for_extraction:
global_config=self.global_config,
+ full_entities_storage=self.lightrag.full_entities,
+ full_relations_storage=self.lightrag.full_relations,
pipeline_status=pipeline_status,
pipeline_status_lock=pipeline_status_lock,
llm_response_cache=self.hashing_kv,
+ entity_chunks_storage=self.lightrag.entity_chunks,
+ relation_chunks_storage=self.lightrag.relation_chunks,
current_file_number=1,
LightRAG Config Used
Paste your config here
Logs and screenshots
No response
Additional Information
- LightRAG Version:1.4.13
- Operating System: 26.3.1
- Python Version:3.14
- Related Issues:
Do you need to file an issue?
Describe the bug
During multimodal content ingestion, several
merge_nodes_and_edgescall sites in processor.py and modalprocessors.py do not passentity_chunks_storage,relation_chunks_storage(and in one case,full_entities_storage/full_relations_storage) tolightrag.operate.merge_nodes_and_edges.Since these parameters default to
None, the calls succeed silently — but entity-to-chunk and relation-to-chunk mappings are never persisted. This means multimodal entities are created in the knowledge graph and vector DB, but their chunk association metadata is lost, degrading retrieval quality for multimodal content.Affected Files (v1.2.10)
_process_multimodal_content_individual(line ~756)entity_chunks_storage,relation_chunks_storage_batch_merge_lightrag_style_type_aware(line ~1360)entity_chunks_storage,relation_chunks_storageBaseModalProcessor._process_chunk_for_extraction(line ~780)full_entities_storage,full_relations_storage,entity_chunks_storage,relation_chunks_storageSteps to reproduce
process_document_complete()kv_store_entity_chunks.jsonandkv_store_relation_chunks.json— multimodal entity/chunk mappings are absentExpected Behavior
All
merge_nodes_and_edgescalls should pass the full set of storage instances, consistent with howlightragitself calls the function internally during text ingestion:Suggested Fix
processor.py — add two lines to both
_process_multimodal_content_individualand_batch_merge_lightrag_style_type_aware:modalprocessors.py — add four lines to
BaseModalProcessor._process_chunk_for_extraction:LightRAG Config Used
Paste your config here
Logs and screenshots
No response
Additional Information