[Data][llm] Add chat_template_kwargs as option when building processor#56490
[Data][llm] Add chat_template_kwargs as option when building processor#56490kouroshHakha merged 8 commits intoray-project:masterfrom
Conversation
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
…_template_kwargs
|
I think it could be worth restructuring the build processor flow to include both an engine config like |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request correctly adds the chat_template_kwargs option to build_llm_processor, allowing for more flexible chat template application. The changes are propagated correctly through the different layers, from the public API down to the ChatTemplateStage. The new functionality is also covered by a new unit test. My main feedback is to refactor the new test to reduce code duplication and improve maintainability by using pytest.mark.parametrize.
| @pytest.mark.asyncio | ||
| async def test_chat_template_udf_chat_template_kwargs(mock_tokenizer_setup): | ||
| mock_tokenizer = mock_tokenizer_setup | ||
|
|
||
| def side_effect_func(conversation, **kwargs): | ||
| enable_thinking = kwargs.get("enable_thinking", True) | ||
| if enable_thinking is False: | ||
| return "Answer without thinking" | ||
| else: | ||
| return "<think>thinking</think>" | ||
|
|
||
| mock_tokenizer.apply_chat_template.side_effect = side_effect_func | ||
|
|
||
| # Test with enable_thinking=False | ||
| udf_no_thinking = ChatTemplateUDF( | ||
| data_column="__data", | ||
| expected_input_keys=["messages"], | ||
| model="test-model", | ||
| chat_template_kwargs={"enable_thinking": False}, | ||
| ) | ||
|
|
||
| batch = { | ||
| "__data": [ | ||
| { | ||
| "messages": MagicMock( | ||
| tolist=lambda: [{"role": "user", "content": "Hello AI"}] | ||
| ) | ||
| } | ||
| ] | ||
| } | ||
|
|
||
| results = [] | ||
| async for result in udf_no_thinking(batch): | ||
| results.extend(result["__data"]) | ||
|
|
||
| assert len(results) == 1 | ||
| assert results[0]["prompt"] == "Answer without thinking" | ||
|
|
||
| # Test with enable_thinking=True (explicit) | ||
| udf_with_thinking = ChatTemplateUDF( | ||
| data_column="__data", | ||
| expected_input_keys=["messages"], | ||
| model="test-model", | ||
| chat_template_kwargs={"enable_thinking": True}, | ||
| ) | ||
|
|
||
| batch_2 = { | ||
| "__data": [ | ||
| { | ||
| "messages": MagicMock( | ||
| tolist=lambda: [{"role": "user", "content": "Hello AI"}] | ||
| ) | ||
| } | ||
| ] | ||
| } | ||
|
|
||
| results = [] | ||
| async for result in udf_with_thinking(batch_2): | ||
| results.extend(result["__data"]) | ||
|
|
||
| assert len(results) == 1 | ||
| assert results[0]["prompt"] == "<think>thinking</think>" | ||
|
|
||
| # Test with no enable_thinking parameter (default should be True) | ||
| udf_default = ChatTemplateUDF( | ||
| data_column="__data", | ||
| expected_input_keys=["messages"], | ||
| model="test-model", | ||
| chat_template_kwargs={}, | ||
| ) | ||
|
|
||
| batch_3 = { | ||
| "__data": [ | ||
| { | ||
| "messages": MagicMock( | ||
| tolist=lambda: [{"role": "user", "content": "Hello AI"}] | ||
| ) | ||
| } | ||
| ] | ||
| } | ||
|
|
||
| results = [] | ||
| async for result in udf_default(batch_3): | ||
| results.extend(result["__data"]) | ||
|
|
||
| assert len(results) == 1 | ||
| assert results[0]["prompt"] == "<think>thinking</think>" | ||
|
|
||
|
|
There was a problem hiding this comment.
This test is well-structured and covers the new functionality correctly. However, there's a significant amount of code duplication across the three test cases (enable_thinking=False, enable_thinking=True, and default). You can make this test more concise and maintainable by using pytest.mark.parametrize to run the same test logic with different inputs.
@pytest.mark.asyncio
@pytest.mark.parametrize(
"chat_template_kwargs, expected_prompt",
[
({"enable_thinking": False}, "Answer without thinking"),
({"enable_thinking": True}, "<think>thinking</think>"),
({}, "<think>thinking</think>"),
],
)
async def test_chat_template_udf_chat_template_kwargs(
mock_tokenizer_setup, chat_template_kwargs, expected_prompt
):
mock_tokenizer = mock_tokenizer_setup
def side_effect_func(conversation, **kwargs):
enable_thinking = kwargs.get("enable_thinking", True)
if enable_thinking is False:
return "Answer without thinking"
else:
return "<think>thinking</think>"
mock_tokenizer.apply_chat_template.side_effect = side_effect_func
udf = ChatTemplateUDF(
data_column="__data",
expected_input_keys=["messages"],
model="test-model",
chat_template_kwargs=chat_template_kwargs,
)
batch = {
"__data": [
{
"messages": MagicMock(
tolist=lambda: [{"role": "user", "content": "Hello AI"}]
)
}
]
}
results = []
async for result in udf(batch):
results.extend(result["__data"])
assert len(results) == 1
assert results[0]["prompt"] == expected_prompt…_template_kwargs
nrghosh
left a comment
There was a problem hiding this comment.
I think it could be worth restructuring the build processor flow to include both an engine config like vLLMEngineProcessorConfig and a processor config. Currently, vLLMEngineProcessorConfig includes fields such as apply_chat_template which should be engine agnostic.
-
Could make sense to decouple - but probably best suited for a separate follow-up PR.
-
I agree with Gemini that the test could be parameterized for maintainability.
-
Also re: testing, a few ideas
- could add some asserts to confirm kwargs (and core flags) were threaded/passed through.
- could parametrize None vs {} so we catch both “unset” and “empty dict”.
- could add a negative test: when apply_chat_template=False, kwargs are ignored and the tokenizer isn’t called.
- could add a quick test that unknown (i.e. not
enable_thinking) kwargs are passed through unchanged
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
…_template_kwargs
…nto chat_template_kwargs
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
| "PreTrainedTokenizerBase", "ProcessorMixin" | ||
| ] = AutoProcessor.from_pretrained(model_path, trust_remote_code=True) | ||
| self.chat_template = chat_template | ||
| self.chat_template_kwargs = chat_template_kwargs |
There was a problem hiding this comment.
do the chat_template_kwargs or {} here.
|
I think this PR is unrelated to the release test failure. If it starts failing on master, we need to investigate it separately. This PR only touches the data llm files the release test is on serve llm. |
ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com>
lmcache one is disabled on master as of the vllm bump Release test |
ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: zac <zac@anyscale.com>
ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Marco Stephan <marco@magic.dev>
#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com>
ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Why are these changes needed?
Certain models require chat_template_kwargs to modify functionality, like disabling thinking for Qwen 8b.
Related issue number
Closes #56384
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.