Fix SummarizeData so that downstream .materialize operations will work.#1030
Merged
Conversation
eric-anderson
approved these changes
Nov 20, 2024
| # so that the materialized data is complete, even if they are not all included | ||
| # in the input prompt to the LLM. | ||
| for di, doc in enumerate(result.take_all()): | ||
| if isinstance(doc, MetadataDocument): |
Collaborator
There was a problem hiding this comment.
You shouldn't need MetadataDocument check. take_all removes those.
| # For query result caching in the executor, we need to consume the documents | ||
| # so that the materialized data is complete, even if they are not all included | ||
| # in the input prompt to the LLM. | ||
| for di, doc in enumerate(result.take_all()): |
Collaborator
There was a problem hiding this comment.
This works at small scale, but will blow up memory at large scale.
Approving as this is NTSB only, but I suggest a TODO or something.
| ) | ||
|
|
||
| # First run should populate cache. | ||
| executor = SycamoreExecutor(context, cache_dir=temp_dir) |
Collaborator
There was a problem hiding this comment.
Is this test fast? If it is, then fine to leave in unit tests, but if it's more than 5-10s, I'd like to get it moved into integration tests. I initially thought this would use ray (which basically guarantees it's slow), but I'm no longer sure.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This should fix the issue where SummarizeData leaves us with an incomplete materialize dir when the total doc size exceeds the LLM token limit.