Skip to content

[Data][llm] Add chat_template_kwargs as option when building processor#56490

Merged
kouroshHakha merged 8 commits intoray-project:masterfrom
hao-aaron:chat_template_kwargs
Sep 16, 2025
Merged

[Data][llm] Add chat_template_kwargs as option when building processor#56490
kouroshHakha merged 8 commits intoray-project:masterfrom
hao-aaron:chat_template_kwargs

Conversation

@hao-aaron
Copy link
Contributor

Why are these changes needed?

Certain models require chat_template_kwargs to modify functionality, like disabling thinking for Qwen 8b.

Related issue number

Closes #56384

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@hao-aaron
Copy link
Contributor Author

I think it could be worth restructuring the build processor flow to include both an engine config like vLLMEngineProcessorConfig and a processor config. Currently, vLLMEngineProcessorConfig includes fields such as apply_chat_template which should be engine agnostic.

@hao-aaron hao-aaron added the go add ONLY when ready to merge, run all tests label Sep 12, 2025
@richardliaw richardliaw added the data Ray Data-related issues label Sep 12, 2025
@nrghosh
Copy link
Contributor

nrghosh commented Sep 12, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly adds the chat_template_kwargs option to build_llm_processor, allowing for more flexible chat template application. The changes are propagated correctly through the different layers, from the public API down to the ChatTemplateStage. The new functionality is also covered by a new unit test. My main feedback is to refactor the new test to reduce code duplication and improve maintainability by using pytest.mark.parametrize.

Comment on lines +92 to +180
@pytest.mark.asyncio
async def test_chat_template_udf_chat_template_kwargs(mock_tokenizer_setup):
mock_tokenizer = mock_tokenizer_setup

def side_effect_func(conversation, **kwargs):
enable_thinking = kwargs.get("enable_thinking", True)
if enable_thinking is False:
return "Answer without thinking"
else:
return "<think>thinking</think>"

mock_tokenizer.apply_chat_template.side_effect = side_effect_func

# Test with enable_thinking=False
udf_no_thinking = ChatTemplateUDF(
data_column="__data",
expected_input_keys=["messages"],
model="test-model",
chat_template_kwargs={"enable_thinking": False},
)

batch = {
"__data": [
{
"messages": MagicMock(
tolist=lambda: [{"role": "user", "content": "Hello AI"}]
)
}
]
}

results = []
async for result in udf_no_thinking(batch):
results.extend(result["__data"])

assert len(results) == 1
assert results[0]["prompt"] == "Answer without thinking"

# Test with enable_thinking=True (explicit)
udf_with_thinking = ChatTemplateUDF(
data_column="__data",
expected_input_keys=["messages"],
model="test-model",
chat_template_kwargs={"enable_thinking": True},
)

batch_2 = {
"__data": [
{
"messages": MagicMock(
tolist=lambda: [{"role": "user", "content": "Hello AI"}]
)
}
]
}

results = []
async for result in udf_with_thinking(batch_2):
results.extend(result["__data"])

assert len(results) == 1
assert results[0]["prompt"] == "<think>thinking</think>"

# Test with no enable_thinking parameter (default should be True)
udf_default = ChatTemplateUDF(
data_column="__data",
expected_input_keys=["messages"],
model="test-model",
chat_template_kwargs={},
)

batch_3 = {
"__data": [
{
"messages": MagicMock(
tolist=lambda: [{"role": "user", "content": "Hello AI"}]
)
}
]
}

results = []
async for result in udf_default(batch_3):
results.extend(result["__data"])

assert len(results) == 1
assert results[0]["prompt"] == "<think>thinking</think>"


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test is well-structured and covers the new functionality correctly. However, there's a significant amount of code duplication across the three test cases (enable_thinking=False, enable_thinking=True, and default). You can make this test more concise and maintainable by using pytest.mark.parametrize to run the same test logic with different inputs.

@pytest.mark.asyncio
@pytest.mark.parametrize(
    "chat_template_kwargs, expected_prompt",
    [
        ({"enable_thinking": False}, "Answer without thinking"),
        ({"enable_thinking": True}, "<think>thinking</think>"),
        ({}, "<think>thinking</think>"),
    ],
)
async def test_chat_template_udf_chat_template_kwargs(
    mock_tokenizer_setup, chat_template_kwargs, expected_prompt
):
    mock_tokenizer = mock_tokenizer_setup

    def side_effect_func(conversation, **kwargs):
        enable_thinking = kwargs.get("enable_thinking", True)
        if enable_thinking is False:
            return "Answer without thinking"
        else:
            return "<think>thinking</think>"

    mock_tokenizer.apply_chat_template.side_effect = side_effect_func

    udf = ChatTemplateUDF(
        data_column="__data",
        expected_input_keys=["messages"],
        model="test-model",
        chat_template_kwargs=chat_template_kwargs,
    )

    batch = {
        "__data": [
            {
                "messages": MagicMock(
                    tolist=lambda: [{"role": "user", "content": "Hello AI"}]
                )
            }
        ]
    }

    results = []
    async for result in udf(batch):
        results.extend(result["__data"])

    assert len(results) == 1
    assert results[0]["prompt"] == expected_prompt

Copy link
Contributor

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be worth restructuring the build processor flow to include both an engine config like vLLMEngineProcessorConfig and a processor config. Currently, vLLMEngineProcessorConfig includes fields such as apply_chat_template which should be engine agnostic.

  1. Could make sense to decouple - but probably best suited for a separate follow-up PR.

  2. I agree with Gemini that the test could be parameterized for maintainability.

  3. Also re: testing, a few ideas

  • could add some asserts to confirm kwargs (and core flags) were threaded/passed through.
  • could parametrize None vs {} so we catch both “unset” and “empty dict”.
  • could add a negative test: when apply_chat_template=False, kwargs are ignored and the tokenizer isn’t called.
  • could add a quick test that unknown (i.e. not enable_thinking) kwargs are passed through unchanged

@hao-aaron hao-aaron marked this pull request as ready for review September 15, 2025 22:55
@hao-aaron hao-aaron requested a review from a team as a code owner September 15, 2025 22:55
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG. just one nit:

"PreTrainedTokenizerBase", "ProcessorMixin"
] = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
self.chat_template = chat_template
self.chat_template_kwargs = chat_template_kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the chat_template_kwargs or {} here.

@kouroshHakha
Copy link
Contributor

I think this PR is unrelated to the release test failure. If it starts failing on master, we need to investigate it separately. This PR only touches the data llm files the release test is on serve llm.

@kouroshHakha kouroshHakha merged commit 19bfc16 into ray-project:master Sep 16, 2025
4 of 5 checks passed
jmajety-dev pushed a commit to jmajety-dev/ray that referenced this pull request Sep 16, 2025
@nrghosh
Copy link
Contributor

nrghosh commented Sep 16, 2025

I think this PR is unrelated to the release test failure. If it starts failing on master, we need to investigate it separately. This PR only touches the data llm files the release test is on serve llm.

lmcache one is disabled on master as of the vllm bump

Release test llm_serve_llama_3dot1_8B_quantized_tp1_2p6d_lmcache failure root cause looks like ZMQError: Address already in use because of some port resource contention where lmcache is trying to bind to localhost:55555. This could be flaky because some previous process is being cleaned up unreliably (which is/was using the same port)?

ZacAttack pushed a commit to ZacAttack/ray that referenced this pull request Sep 24, 2025
ray-project#56490)

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: zac <zac@anyscale.com>
marcostephan pushed a commit to marcostephan/ray that referenced this pull request Sep 24, 2025
ray-project#56490)

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Marco Stephan <marco@magic.dev>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
#56490)

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[data][llm] Cannot pass enable_thinking to chat_template

4 participants