Skip to content

fix: prevent None file_path from propagating as unknown_source#2796

Merged
danielaskdd merged 1 commit intoHKUDS:mainfrom
he-yufeng:fix/null-file-path-propagation
Mar 19, 2026
Merged

fix: prevent None file_path from propagating as unknown_source#2796
danielaskdd merged 1 commit intoHKUDS:mainfrom
he-yufeng:fix/null-file-path-propagation

Conversation

@he-yufeng
Copy link
Copy Markdown
Contributor

Summary

Fixes the root cause of #2764 where some documents show "unknown_source" instead of their actual filenames.

Root Cause

getattr(status_doc, "file_path", "unknown_source") only returns the fallback when the attribute doesn't exist. When file_path is present but None, it returns None — which then permanently overwrites valid filenames during reprocessing or consistency validation.

Same pattern in JsonDocStatusStorage: "file_path" not in data passes when the key exists with a null value, creating DocProcessingStatus(file_path=None).

Changes

lightrag/lightrag.py (4 sites):

# Before — returns None when file_path attribute is None
file_path = getattr(status_doc, "file_path", "unknown_source")

# After — catches both missing and None
file_path = getattr(status_doc, "file_path", None) or "unknown_source"

lightrag/kg/json_doc_status_impl.py (3 sites):

# Before — only catches missing key, not null value
if "file_path" not in data:
    data["file_path"] = "no-file-path"

# After — catches missing, None, and empty string
if not data.get("file_path"):
    data["file_path"] = "no-file-path"

Complements #2793 which fixed the API input layer — this PR fixes the processing pipeline and storage layer.

Test plan

  • 275 tests pass, 33 skipped (pre-existing skip/xfail)
  • 1 pre-existing failure in test_interactive_setup_outputs.py unrelated to this change

`getattr(status_doc, "file_path", "unknown_source")` returns
the fallback only when the attribute doesn't exist. When
`file_path` is present but `None`, it returns `None` — which
then overwrites valid filenames during reprocessing or
consistency validation.

Same issue in JsonDocStatusStorage: `"file_path" not in data`
passes when the key exists with a null value, creating
`DocProcessingStatus(file_path=None)`.

Fix both patterns:
- `getattr(..., None) or "unknown_source"` in lightrag.py (4 sites)
- `not data.get("file_path")` in json_doc_status_impl.py (3 sites)

Complements HKUDS#2793 which fixed the API input layer.
Fixes HKUDS#2764.
@danielaskdd
Copy link
Copy Markdown
Collaborator

@codex review

1 similar comment
@danielaskdd
Copy link
Copy Markdown
Collaborator

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielaskdd danielaskdd merged commit 7baf562 into HKUDS:main Mar 19, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants