LLM Batch inference#1202
Merged
Merged
Conversation
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
karanataryn
approved these changes
Feb 28, 2025
| return res | ||
| elif llm_mode == LLMMode.BATCH: | ||
| raise NotImplementedError("Haven't done batch yet") | ||
| return llm.generate_batch(prompts=prompts) |
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds batch inference modes for openai and anthropic.
I didn't do bedrock or gemini bc those involve dealing with s3 and gcs/bigquery.
OpenAI batch is pretty slow - to be able to test it I ended up using 3.5 turbo as it has far less demand and batch inferences are low priority (expires only after 24h). Anthropic haiku 3 was decently fast (competitive with async!) although that may be the same effect. I did not test it with a more modern / powerful claude.