fix: Best-effort use of chat completion#1789
Conversation
13e2ecb to
06eba73
Compare
|
@RobinPicard Since this PR turned out a bit larger, I’ve put together a detailed change log to help streamline your review. 🤗 1. Best-Effort Use of Chat CompletionFor the four local model backends (
Key changes:
2. Special Case: vLLM Offline
if any(isinstance(item, Chat) for item in model_input):
raise TypeError(
"Batch generation is not available for the `Chat` input type."
)
3. Special Case: LlamaCPP
4. Other Changes
|
There was a problem hiding this comment.
Pull request overview
This PR implements best-effort chat completion support across multiple model adapters (vLLM, Transformers, MLX-LM, and LlamaCpp) by automatically detecting whether a model's tokenizer has a chat template and conditionally formatting string inputs as chat messages.
Key changes:
- Added automatic chat template detection that wraps plain string inputs as user messages when a chat template is available
- Introduced a
chat_modeparameter for LlamaCpp to allow users to explicitly disable chat-style formatting - Implemented
_check_hf_chat_template()helper function to check for HuggingFace chat template availability
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
outlines/models/tokenizer.py |
Added _check_hf_chat_template() helper function to detect chat template availability |
outlines/models/vllm_offline.py |
Updated VLLMOfflineTypeAdapter to conditionally format string inputs as chat messages based on template availability |
outlines/models/transformers.py |
Modified TransformersTypeAdapter to support chat template detection and conditional formatting |
outlines/models/mlxlm.py |
Updated MLXLMTypeAdapter with chat template support and conditional string input formatting |
outlines/models/llamacpp.py |
Added chat_mode parameter to LlamaCpp model to allow explicit control over chat-style input formatting |
docs/features/models/llamacpp.md |
Updated documentation to describe the new chat_mode parameter and its usage |
tests/models/test_tokenizer.py |
Added tests for the new chat template detection function |
tests/models/test_vllm_offline_type_adapter.py |
Added tests for string input formatting with and without chat templates |
tests/models/test_transformers_type_adapter.py |
Updated tests to cover chat template conditional behavior |
tests/models/test_mlxlm_type_adapter.py |
Added tests for chat template support using mocks |
tests/models/test_llamacpp_type_adapter.py |
Added tests for chat template conditional formatting |
tests/models/test_llamacpp.py |
Added test fixture and tests for non-chat mode, updated streaming tests to handle empty tokens |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks a lot for the great description! I'll review it in the coming days |
No worries at all, Robin — no rush! Really happy to be working with you. 🥳 |
RobinPicard
left a comment
There was a problem hiding this comment.
This is excellent, thanks a lot!
…ream test assertions
…ne.py so that no runtime errors occur
bce315d to
1744cb0
Compare
@RobinPicard You're welcome, Robin. There was a CI error related to disk space running out (see below), but after I pushed an empty commit to restart it, everything now appears to be working. 🤗 System.IO.IOException: No space left on device : '/home/runner/actions-runner/cached/_diag/Worker_20251208-090232-utc.log'
at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.Diagnostics.TextWriterTraceListener.Flush()
at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/actions-runner/cached/_diag/Worker_20251208-090232-utc.log'
at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.Diagnostics.TextWriterTraceListener.Flush()
at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
at GitHub.Runner.Common.Tracing.Error(Exception exception)
at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/actions-runner/cached/_diag/Worker_20251208-090232-utc.log'
at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.Diagnostics.TextWriterTraceListener.Flush()
at System.Diagnostics.TraceSource.Flush()
at GitHub.Runner.Common.Tracing.Dispose(Boolean disposing)
at GitHub.Runner.Common.Tracing.Dispose()
at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
at GitHub.Runner.Common.TraceManager.Dispose()
at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
at GitHub.Runner.Common.HostContext.Dispose()
at GitHub.Runner.Worker.Program.Main(String[] args) |
|
Hi @RobinPicard, just a gentle reminder — are there any other points I should consider before moving forward? 😀 |
|
No, it's perfect! Thanks for the reminder, I was focused on something else and forgot |
|
Got it. Thank you, Robin!! ❤️ |
Fixes #1784