Skip to content

LLM Batch inference#1202

Merged
HenryL27 merged 4 commits into
mainfrom
hml-llm-batch
Feb 28, 2025
Merged

LLM Batch inference#1202
HenryL27 merged 4 commits into
mainfrom
hml-llm-batch

Conversation

@HenryL27
Copy link
Copy Markdown
Collaborator

Adds batch inference modes for openai and anthropic.
I didn't do bedrock or gemini bc those involve dealing with s3 and gcs/bigquery.

OpenAI batch is pretty slow - to be able to test it I ended up using 3.5 turbo as it has far less demand and batch inferences are low priority (expires only after 24h). Anthropic haiku 3 was decently fast (competitive with async!) although that may be the same effect. I did not test it with a more modern / powerful claude.

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Copy link
Copy Markdown
Contributor

@karanataryn karanataryn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with few suggestions. Can we add a unit test of some sort here? Integration tests would obviously be not practical.

Comment thread lib/sycamore/sycamore/llms/anthropic.py Outdated
Comment thread lib/sycamore/sycamore/llms/anthropic.py Outdated
Comment thread lib/sycamore/sycamore/llms/openai.py
return res
elif llm_mode == LLMMode.BATCH:
raise NotImplementedError("Haven't done batch yet")
return llm.generate_batch(prompts=prompts)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
@HenryL27 HenryL27 merged commit 889844a into main Feb 28, 2025
@HenryL27 HenryL27 deleted the hml-llm-batch branch February 28, 2025 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants