fix: Best-effort use of chat completion by Ki-Seki · Pull Request #1789 · dottxt-ai/outlines

Ki-Seki · 2025-12-02T09:03:08Z

Ki-Seki · 2025-12-04T07:55:26Z

@RobinPicard Since this PR turned out a bit larger, I’ve put together a detailed change log to help streamline your review. 🤗

1. Best-Effort Use of Chat Completion

For the four local model backends (llamacpp, mlxlm, transformers, and vllm_offline), this PR implements the following logic proposed in issue #1784 :

If a local model provides a chat template, we assume it expects us to use it — so we do. If not, we fall back to plain completion mode. If a backend does not support chat mode at all, we also fall back to plain completion mode.

Key changes:

Added a helper function _check_hf_chat_template in outlines/models/tokenizer.py to centralize Hugging Face chat template checks, since the logic is shared.
Introduced a new property has_chat_template in the TypeAdapter of the local models, typically determined by _check_hf_chat_template.
When has_chat_template is True, string-based model_input is converted to chat format whenever possible, usually via format_chat_input.
Updated the common/stream/batch generate functions across all four local models accordingly.

2. Special Case: vLLM Offline

Since vLLM internally uses TokenizerBase (a non-Hugging Face tokenizer class), additional checks were added for compatibility.
Previously, generate_batch was assumed not to support chat mode, and the following code was present:

if any(isinstance(item, Chat) for item in model_input):
    raise TypeError(
        "Batch generation is not available for the `Chat` input type."
    )

After re-verification, chat mode is supported (see vLLM documentation), so this restriction has been removed.

3. Special Case: LlamaCPP

LlamaCPP does not automatically check for chat templates; instead, it relies on user-provided parameter.
This is because llama-cpp-python provides a default fallback chat template even when the user has not explicitly configured one (source).
To align with this behavior, users are now expected to explicitly pass the chat_mode parameter (default: True).

Note:
This is the only interface-level change in the PR. Compared to the previous implementation, where string inputs defaulted to plain text completion, they now default to chat completion whenever possible.
I believe this is reasonable: since llama-cpp-python itself encourages a fallback chat template, this change should feel natural to users familiar with LlamaCPP.
For strict backward compatibility, however, we could consider setting the default to chat_mode=False.

4. Other Changes

Updated documentation to reflect the LlamaCPP interface change.
Added comprehensive unit tests for all new functionality, all of which pass and maintain 100% coverage.
Ensured consistent code style throughout.

Copilot

Pull request overview

This PR implements best-effort chat completion support across multiple model adapters (vLLM, Transformers, MLX-LM, and LlamaCpp) by automatically detecting whether a model's tokenizer has a chat template and conditionally formatting string inputs as chat messages.

Key changes:

Added automatic chat template detection that wraps plain string inputs as user messages when a chat template is available
Introduced a chat_mode parameter for LlamaCpp to allow users to explicitly disable chat-style formatting
Implemented _check_hf_chat_template() helper function to check for HuggingFace chat template availability

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`outlines/models/tokenizer.py`	Added `_check_hf_chat_template()` helper function to detect chat template availability
`outlines/models/vllm_offline.py`	Updated VLLMOfflineTypeAdapter to conditionally format string inputs as chat messages based on template availability
`outlines/models/transformers.py`	Modified TransformersTypeAdapter to support chat template detection and conditional formatting
`outlines/models/mlxlm.py`	Updated MLXLMTypeAdapter with chat template support and conditional string input formatting
`outlines/models/llamacpp.py`	Added `chat_mode` parameter to LlamaCpp model to allow explicit control over chat-style input formatting
`docs/features/models/llamacpp.md`	Updated documentation to describe the new `chat_mode` parameter and its usage
`tests/models/test_tokenizer.py`	Added tests for the new chat template detection function
`tests/models/test_vllm_offline_type_adapter.py`	Added tests for string input formatting with and without chat templates
`tests/models/test_transformers_type_adapter.py`	Updated tests to cover chat template conditional behavior
`tests/models/test_mlxlm_type_adapter.py`	Added tests for chat template support using mocks
`tests/models/test_llamacpp_type_adapter.py`	Added tests for chat template conditional formatting
`tests/models/test_llamacpp.py`	Added test fixture and tests for non-chat mode, updated streaming tests to handle empty tokens

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

outlines/models/vllm_offline.py

tests/models/test_llamacpp.py

outlines/models/vllm_offline.py

outlines/models/llamacpp.py

RobinPicard · 2025-12-04T09:33:31Z

Thanks a lot for the great description! I'll review it in the coming days

Ki-Seki · 2025-12-04T09:36:12Z

Thanks a lot for the great description! I'll review it in the coming days

No worries at all, Robin — no rush! Really happy to be working with you. 🥳

RobinPicard

This is excellent, thanks a lot!

… chat handling

…ream test assertions

…ne.py so that no runtime errors occur

Ki-Seki · 2025-12-08T13:19:46Z

This is excellent, thanks a lot!

@RobinPicard You're welcome, Robin. There was a CI error related to disk space running out (see below), but after I pushed an empty commit to restart it, everything now appears to be working. 🤗

System.IO.IOException: No space left on device : '/home/runner/actions-runner/cached/_diag/Worker_20251208-090232-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/actions-runner/cached/_diag/Worker_20251208-090232-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Common.Tracing.Error(Exception exception)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/actions-runner/cached/_diag/Worker_20251208-090232-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at System.Diagnostics.TraceSource.Flush()
   at GitHub.Runner.Common.Tracing.Dispose(Boolean disposing)
   at GitHub.Runner.Common.Tracing.Dispose()
   at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
   at GitHub.Runner.Common.TraceManager.Dispose()
   at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
   at GitHub.Runner.Common.HostContext.Dispose()
   at GitHub.Runner.Worker.Program.Main(String[] args)

Ki-Seki · 2025-12-11T05:21:41Z

Hi @RobinPicard, just a gentle reminder — are there any other points I should consider before moving forward? 😀

RobinPicard · 2025-12-12T18:24:23Z

No, it's perfect! Thanks for the reminder, I was focused on something else and forgot

Ki-Seki · 2025-12-13T00:29:24Z

Got it. Thank you, Robin!! ❤️

Ki-Seki mentioned this pull request Dec 3, 2025

fix: correct chat format usage #1790

Merged

RobinPicard force-pushed the fix/chat-template branch from 13e2ecb to 06eba73 Compare December 3, 2025 13:24

Ki-Seki marked this pull request as ready for review December 4, 2025 07:55

Copilot AI review requested due to automatic review settings December 4, 2025 07:55

Copilot started reviewing on behalf of Ki-Seki December 4, 2025 07:55 View session

Copilot finished reviewing on behalf of Ki-Seki December 4, 2025 08:00

Copilot AI reviewed Dec 4, 2025

View reviewed changes

RobinPicard approved these changes Dec 8, 2025

View reviewed changes

Ki-Seki added 15 commits December 8, 2025 10:02

feat: add chat template checking func

ecd6ade

fix: Best-effort use of chat completion for VLLMOffline

9349453

fix: Best-effort use of chat completion for Transformers

a3c8234

fix: Best-effort use of chat completion for mlxlm

6a173d3

fix: Best-effort use of chat completion for LlamaCpp

d2876a2

test: Add some initial tests

5d11e62

fix: update stream tests to handle potential empty chunks

ab2bf4a

fix: add chat_mode parameter to LlamaCpp and from_llamacpp for better…

0d86a1e

… chat handling

fix: remove TODO comments for adding tests in LlamaCpp class methods

d0a6420

fix: improve error handling for unexpected prompt types and update st…

730d84a

…ream test assertions

test: simplify some tests

eafea53

fix: add tests for _check_hf_chat_template function in tokenizer module

8767873

fix: remove unused AnyTokenizer import from vllm_offline.py

899f0b1

fix: correct typo

ba22fb9

fix: move tokenizer imports to the appropriate location in vllm_offli…

1744cb0

…ne.py so that no runtime errors occur

RobinPicard force-pushed the fix/chat-template branch from bce315d to 1744cb0 Compare December 8, 2025 09:02

chore: re-run the ci

81304dc

RobinPicard merged commit b98ee31 into dottxt-ai:main Dec 12, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Best-effort use of chat completion#1789

fix: Best-effort use of chat completion#1789
RobinPicard merged 16 commits intodottxt-ai:mainfrom
Ki-Seki:fix/chat-template

Ki-Seki commented Dec 2, 2025

Uh oh!

Ki-Seki commented Dec 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RobinPicard commented Dec 4, 2025

Uh oh!

Ki-Seki commented Dec 4, 2025

Uh oh!

RobinPicard left a comment

Uh oh!

Ki-Seki commented Dec 8, 2025

Uh oh!

Ki-Seki commented Dec 11, 2025

Uh oh!

RobinPicard commented Dec 12, 2025

Uh oh!

Uh oh!

Ki-Seki commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ki-Seki commented Dec 2, 2025

Uh oh!

Ki-Seki commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Best-Effort Use of Chat Completion

Key changes:

2. Special Case: vLLM Offline

3. Special Case: LlamaCPP

4. Other Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RobinPicard commented Dec 4, 2025

Uh oh!

Ki-Seki commented Dec 4, 2025

Uh oh!

RobinPicard left a comment

Choose a reason for hiding this comment

Uh oh!

Ki-Seki commented Dec 8, 2025

Uh oh!

Ki-Seki commented Dec 11, 2025

Uh oh!

RobinPicard commented Dec 12, 2025

Uh oh!

Uh oh!

Ki-Seki commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ki-Seki commented Dec 4, 2025 •

edited

Loading