[Data][llm] Add chat_template_kwargs as option when building processor by hao-aaron · Pull Request #56490 · ray-project/ray

hao-aaron · 2025-09-12T20:53:30Z

Why are these changes needed?

Certain models require chat_template_kwargs to modify functionality, like disabling thinking for Qwen 8b.

Related issue number

Closes #56384

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

…_template_kwargs

hao-aaron · 2025-09-12T20:56:40Z

I think it could be worth restructuring the build processor flow to include both an engine config like vLLMEngineProcessorConfig and a processor config. Currently, vLLMEngineProcessorConfig includes fields such as apply_chat_template which should be engine agnostic.

nrghosh · 2025-09-12T21:36:41Z

/gemini review

gemini-code-assist

Code Review

This pull request correctly adds the chat_template_kwargs option to build_llm_processor, allowing for more flexible chat template application. The changes are propagated correctly through the different layers, from the public API down to the ChatTemplateStage. The new functionality is also covered by a new unit test. My main feedback is to refactor the new test to reduce code duplication and improve maintainability by using pytest.mark.parametrize.

gemini-code-assist · 2025-09-12T21:37:50Z

python/ray/llm/tests/batch/cpu/stages/test_chat_template_stage.py

+@pytest.mark.asyncio
+async def test_chat_template_udf_chat_template_kwargs(mock_tokenizer_setup):
+    mock_tokenizer = mock_tokenizer_setup
+
+    def side_effect_func(conversation, **kwargs):
+        enable_thinking = kwargs.get("enable_thinking", True)
+        if enable_thinking is False:
+            return "Answer without thinking"
+        else:
+            return "<think>thinking</think>"
+
+    mock_tokenizer.apply_chat_template.side_effect = side_effect_func
+
+    # Test with enable_thinking=False
+    udf_no_thinking = ChatTemplateUDF(
+        data_column="__data",
+        expected_input_keys=["messages"],
+        model="test-model",
+        chat_template_kwargs={"enable_thinking": False},
+    )
+
+    batch = {
+        "__data": [
+            {
+                "messages": MagicMock(
+                    tolist=lambda: [{"role": "user", "content": "Hello AI"}]
+                )
+            }
+        ]
+    }
+
+    results = []
+    async for result in udf_no_thinking(batch):
+        results.extend(result["__data"])
+
+    assert len(results) == 1
+    assert results[0]["prompt"] == "Answer without thinking"
+
+    # Test with enable_thinking=True (explicit)
+    udf_with_thinking = ChatTemplateUDF(
+        data_column="__data",
+        expected_input_keys=["messages"],
+        model="test-model",
+        chat_template_kwargs={"enable_thinking": True},
+    )
+
+    batch_2 = {
+        "__data": [
+            {
+                "messages": MagicMock(
+                    tolist=lambda: [{"role": "user", "content": "Hello AI"}]
+                )
+            }
+        ]
+    }
+
+    results = []
+    async for result in udf_with_thinking(batch_2):
+        results.extend(result["__data"])
+
+    assert len(results) == 1
+    assert results[0]["prompt"] == "<think>thinking</think>"
+
+    # Test with no enable_thinking parameter (default should be True)
+    udf_default = ChatTemplateUDF(
+        data_column="__data",
+        expected_input_keys=["messages"],
+        model="test-model",
+        chat_template_kwargs={},
+    )
+
+    batch_3 = {
+        "__data": [
+            {
+                "messages": MagicMock(
+                    tolist=lambda: [{"role": "user", "content": "Hello AI"}]
+                )
+            }
+        ]
+    }
+
+    results = []
+    async for result in udf_default(batch_3):
+        results.extend(result["__data"])
+
+    assert len(results) == 1
+    assert results[0]["prompt"] == "<think>thinking</think>"
+
+


This test is well-structured and covers the new functionality correctly. However, there's a significant amount of code duplication across the three test cases (enable_thinking=False, enable_thinking=True, and default). You can make this test more concise and maintainable by using pytest.mark.parametrize to run the same test logic with different inputs.

@pytest.mark.asyncio @pytest.mark.parametrize( "chat_template_kwargs, expected_prompt", [ ({"enable_thinking": False}, "Answer without thinking"), ({"enable_thinking": True}, "<think>thinking</think>"), ({}, "<think>thinking</think>"), ], ) async def test_chat_template_udf_chat_template_kwargs( mock_tokenizer_setup, chat_template_kwargs, expected_prompt ): mock_tokenizer = mock_tokenizer_setup def side_effect_func(conversation, **kwargs): enable_thinking = kwargs.get("enable_thinking", True) if enable_thinking is False: return "Answer without thinking" else: return "<think>thinking</think>" mock_tokenizer.apply_chat_template.side_effect = side_effect_func udf = ChatTemplateUDF( data_column="__data", expected_input_keys=["messages"], model="test-model", chat_template_kwargs=chat_template_kwargs, ) batch = { "__data": [ { "messages": MagicMock( tolist=lambda: [{"role": "user", "content": "Hello AI"}] ) } ] } results = [] async for result in udf(batch): results.extend(result["__data"]) assert len(results) == 1 assert results[0]["prompt"] == expected_prompt

…_template_kwargs

nrghosh

I think it could be worth restructuring the build processor flow to include both an engine config like vLLMEngineProcessorConfig and a processor config. Currently, vLLMEngineProcessorConfig includes fields such as apply_chat_template which should be engine agnostic.

Could make sense to decouple - but probably best suited for a separate follow-up PR.
I agree with Gemini that the test could be parameterized for maintainability.
Also re: testing, a few ideas

could add some asserts to confirm kwargs (and core flags) were threaded/passed through.
could parametrize None vs {} so we catch both “unset” and “empty dict”.
could add a negative test: when apply_chat_template=False, kwargs are ignored and the tokenizer isn’t called.
could add a quick test that unknown (i.e. not enable_thinking) kwargs are passed through unchanged

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

…_template_kwargs

…nto chat_template_kwargs

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

kouroshHakha

LG. just one nit:

kouroshHakha · 2025-09-15T23:37:35Z

python/ray/llm/_internal/batch/stages/chat_template_stage.py

            "PreTrainedTokenizerBase", "ProcessorMixin"
        ] = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
        self.chat_template = chat_template
+        self.chat_template_kwargs = chat_template_kwargs


do the chat_template_kwargs or {} here.

kouroshHakha · 2025-09-16T17:45:44Z

I think this PR is unrelated to the release test failure. If it starts failing on master, we need to investigate it separately. This PR only touches the data llm files the release test is on serve llm.

ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com>

nrghosh · 2025-09-16T22:55:52Z

I think this PR is unrelated to the release test failure. If it starts failing on master, we need to investigate it separately. This PR only touches the data llm files the release test is on serve llm.

lmcache one is disabled on master as of the vllm bump

Release test llm_serve_llama_3dot1_8B_quantized_tp1_2p6d_lmcache failure root cause looks like ZMQError: Address already in use because of some port resource contention where lmcache is trying to bind to localhost:55555. This could be flaky because some previous process is being cleaned up unreliably (which is/was using the same port)?

ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: zac <zac@anyscale.com>

ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Marco Stephan <marco@magic.dev>

#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>

ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com>

hao-aaron added 2 commits September 12, 2025 13:48

Added new chat_template_kwargs field to build_llm_processor

f35d905

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Merge branch 'master' of https://github.com/ray-project/ray into chat…

46f2b54

…_template_kwargs

hao-aaron requested review from kouroshHakha and nrghosh September 12, 2025 20:57

hao-aaron added the go add ONLY when ready to merge, run all tests label Sep 12, 2025

richardliaw added the data Ray Data-related issues label Sep 12, 2025

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

Merge branch 'master' of https://github.com/ray-project/ray into chat…

5b5fbe3

…_template_kwargs

nrghosh reviewed Sep 13, 2025

View reviewed changes

hao-aaron added 5 commits September 15, 2025 10:23

adjusted tests

c0800c7

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Merge branch 'ray-project:master' into chat_template_kwargs

443eb87

Merge branch 'master' of https://github.com/ray-project/ray into chat…

7deace5

…_template_kwargs

Merge branch 'chat_template_kwargs' of github.com:ahao-anyscale/ray i…

b0da9ff

…nto chat_template_kwargs

fixed sglang engine integration

a4fec88

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

hao-aaron marked this pull request as ready for review September 15, 2025 22:55

hao-aaron requested a review from a team as a code owner September 15, 2025 22:55

kouroshHakha reviewed Sep 15, 2025

View reviewed changes

nrghosh approved these changes Sep 15, 2025

View reviewed changes

kouroshHakha approved these changes Sep 16, 2025

View reviewed changes

kouroshHakha merged commit 19bfc16 into ray-project:master Sep 16, 2025
4 of 5 checks passed

jmajety-dev pushed a commit to jmajety-dev/ray that referenced this pull request Sep 16, 2025

[Data][llm] Add chat_template_kwargs as option when building processor (

0b24041

ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com>

ZacAttack pushed a commit to ZacAttack/ray that referenced this pull request Sep 24, 2025

[Data][llm] Add chat_template_kwargs as option when building processor (

ef7b150

ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: zac <zac@anyscale.com>

marcostephan pushed a commit to marcostephan/ray that referenced this pull request Sep 24, 2025

[Data][llm] Add chat_template_kwargs as option when building processor (

0631d93

ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Marco Stephan <marco@magic.dev>

nrghosh mentioned this pull request Sep 30, 2025

[Data][LLM] Support multi-node TP/PP for ray.data.llm #56779

Merged

8 tasks

This was referenced Sep 30, 2025

[ray.data.llm] Error when constructing ServeDeploymentProcessor due to unexpected argument from build_llm_processor #57060

Closed

[ray.data.llm] Fix build_llm_processor for ServeDeploymentProcessor #57061

Merged

dstrodtman pushed a commit that referenced this pull request Oct 6, 2025

[Data][llm] Add chat_template_kwargs as option when building processor (

6e1cd0c

#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>

justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025

[Data][llm] Add chat_template_kwargs as option when building processor (

34110f1

ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com>

landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025

[Data][llm] Add chat_template_kwargs as option when building processor (

e1da710

ray-project#56490) Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data][llm] Add chat_template_kwargs as option when building processor#56490

[Data][llm] Add chat_template_kwargs as option when building processor#56490
kouroshHakha merged 8 commits intoray-project:masterfrom
hao-aaron:chat_template_kwargs

hao-aaron commented Sep 12, 2025

Uh oh!

hao-aaron commented Sep 12, 2025

Uh oh!

nrghosh commented Sep 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

nrghosh left a comment

Uh oh!

kouroshHakha left a comment

Uh oh!

kouroshHakha Sep 15, 2025

Uh oh!

kouroshHakha commented Sep 16, 2025

Uh oh!

Uh oh!

nrghosh commented Sep 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hao-aaron commented Sep 12, 2025

Why are these changes needed?

Related issue number

Checks

Uh oh!

hao-aaron commented Sep 12, 2025

Uh oh!

nrghosh commented Sep 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

nrghosh left a comment

Choose a reason for hiding this comment

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha commented Sep 16, 2025

Uh oh!

Uh oh!

nrghosh commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nrghosh commented Sep 16, 2025 •

edited

Loading