Skip to content

[data][llm] Add pooling parameter#59534

Merged
kouroshHakha merged 3 commits intoray-project:masterfrom
jeffreywang-anyscale:pooling-params
Dec 20, 2025
Merged

[data][llm] Add pooling parameter#59534
kouroshHakha merged 3 commits intoray-project:masterfrom
jeffreywang-anyscale:pooling-params

Conversation

@jeffreywang-anyscale
Copy link
Contributor

Description

The preprocessor forwards only sampling_params to the engine today. For task_type="embed", however, we should also allow forwarding pooling_params, enabling features such as truncating the input prompt to a fixed token budget via truncate_prompt_tokens or normalizing the output embedding via normalize. See https://docs.vllm.ai/en/latest/api/vllm/#vllm.PoolingParams for a comprehensive list of supported attributes.

Related issues

Resolves #57805

Additional information

  • Similar to sampling parameters, we allow users to specify pooling parameter under the "pooling_params" column for embedding tasks.
  • Add tests in test_vllm_engine_stage to validate that the pooling parameters are received by the engine.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for pooling_params in vLLM embedding tasks, allowing users to specify parameters like token truncation and embedding normalization. The changes include updating the vLLM engine stage to handle these parameters and adding corresponding tests to validate the new functionality. My review includes a couple of suggestions for improving maintainability and aligning the implementation with the documented behavior.

@ray-gardener ray-gardener bot added data Ray Data-related issues llm labels Dec 18, 2025
@pytest.mark.parametrize(
"pooling_params",
[
{"truncate_prompt_tokens": -1},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also test None, and {} ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test case for empty dict. None is not a possible value as we default to empty dict if pooling_params is not provided.

for key, expected_value in pooling_params.items():
assert hasattr(request.params, key)
actual_value = getattr(request.params, key)
assert actual_value == expected_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical can we test on some other property?

you want to basically test whether truncation is applied or whether normalizatio is applied.
It is not sufficient to check for request.params values to match what was sent in.

Idea: We can test that on input x the answer will be different comparing truncation=None vs. truncation=2, similarly we can test normalize=False vs. normalize=True

Copy link
Contributor Author

@jeffreywang-anyscale jeffreywang-anyscale Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validating the difference of outputs is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sampling_params do not apply to encode. encode is deterministic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple more test cases:

  • Compare truncation=None vs. truncation=3
  • Compare normalize=False vs. normalize=True
  • Validate that truncation is effective on long prompts

@jeffreywang-anyscale
Copy link
Contributor Author

In the latest revision, introduced a small fix:

Pooling parameter's truncate_prompt_tokens is not respected by AsyncLLMEngine.encode(). I filed an vllm-project/vllm#31012 in vLLM and have a vllm-project/vllm#31013 for it.

As a temporary solution, prompt truncation is handled through the truncate_prompt_tokens argument passed to AsyncLLMEngine.encode. From Ray's users perspective, any value provided via pooling_params will be honored as expected.

@jeffreywang-anyscale jeffreywang-anyscale added the go add ONLY when ready to merge, run all tests label Dec 19, 2025
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kouroshHakha kouroshHakha merged commit 6d91f46 into ray-project:master Dec 20, 2025
6 checks passed
Yicheng-Lu-llll pushed a commit to Yicheng-Lu-llll/ray that referenced this pull request Dec 22, 2025
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
AYou0207 pushed a commit to AYou0207/ray that referenced this pull request Jan 13, 2026
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: jasonwrwang <jasonwrwang@tencent.com>
lee1258561 pushed a commit to pinterest/ray that referenced this pull request Feb 3, 2026
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests llm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[llm] Support PoolingParams (e.g. truncate_prompt_tokens) when using vLLMEngineProcessorConfig

2 participants