Skip to content

Tolerate bad entity extraction.#1078

Merged
eric-anderson merged 2 commits into
mainfrom
eric-upstream-docset
Dec 17, 2024
Merged

Tolerate bad entity extraction.#1078
eric-anderson merged 2 commits into
mainfrom
eric-upstream-docset

Conversation

@eric-anderson
Copy link
Copy Markdown
Collaborator

  • Add tests for tolerating bad extraction.
  • Substantially refactor llm_filter so we don't have a giant function in docset.py
  • Also fix a minor bug where the tokenized filter was re-looking up the text in the element
    rather than using the txt variable.

* Add tests for tolerating bad extraction.
* Substantially refactor llm_filter so we don't have a giant function in docset.py
* Also fix a minor bug where the tokenized filter was re-looking up the text in the element
  rather than using the txt variable.
@eric-anderson eric-anderson merged commit e4c213e into main Dec 17, 2024
@eric-anderson eric-anderson deleted the eric-upstream-docset branch December 17, 2024 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants