[Data][LLM] Add support for classification and scoring models in Ray Data LLM by kouroshHakha · Pull Request #59499 · ray-project/ray

kouroshHakha · 2025-12-17T04:55:47Z

Summary

This PR adds support for sequence classification and scoring models in Ray Data LLM batch inference. Previously, only generate and embed task types were supported. This change enables users to run batch inference with models like nvidia/nemocurator-fineweb-nemotron-4-edu-classifier for educational content classification.

Motivation

vLLM supports multiple task types for different model architectures:

generate - Text generation (LLMs)
embed - Embedding models
classify - Sequence classification models
score - Cross-encoder scoring models

Ray Data LLM only supported generate and embed. This PR adds classify and score to enable batch inference with classification models.

Usage

from ray.data.llm import vLLMEngineProcessorConfig, build_processor

config = vLLMEngineProcessorConfig(
    model_source="nvidia/nemocurator-fineweb-nemotron-4-edu-classifier",
    task_type="classify",  # NEW: use 'classify' for classification models
    engine_kwargs={"max_model_len": 512},
    apply_chat_template=False,
    detokenize=False,
)

processor = build_processor(
    config,
    preprocess=lambda row: dict(prompt=row["text"]),
    postprocess=lambda row: {"score": float(row["embeddings"][0])},
)

ds = ray.data.from_items([{"text": "Sample text to classify"}])
result = processor(ds)

Testing

New classification_example.py is automatically picked up by CI via the existing glob pattern in BUILD.bazel
Tested locally with nvidia/nemocurator-fineweb-nemotron-4-edu-classifier

…ata LLM Add CLASSIFY and SCORE task types to vLLMTaskType enum to enable batch inference with sequence classification models (e.g., educational content classifiers, sentiment analyzers) and cross-encoder scoring models. - Add CLASSIFY and SCORE to vLLMTaskType enum - Route classify/score tasks to engine.encode() with PoolingParams - Add classification_example.py documentation example - Update working-with-llms.rst with classification section - Add classification config example to basic_llm_example.py Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gemini-code-assist

Code Review

This pull request adds support for classification and scoring models to Ray Data's LLM batch inference capabilities, which previously only supported generation and embedding tasks. The changes are well-implemented, introducing classify and score task types and updating the vLLM engine stage to handle them. The documentation and examples have also been updated accordingly. I've provided a few suggestions to improve the robustness of the example code and enhance the maintainability of the core logic.

doc/source/data/doc_code/working-with-llms/basic_llm_example.py

doc/source/data/doc_code/working-with-llms/classification_example.py

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com>

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…ha/ray into kh/add-score-data-llm Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

jeffreywang-anyscale

Looks like you addressed Gemini comments in the latest revision. LGTM.

doc/source/data/doc_code/working-with-llms/basic_llm_example.py

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…Data LLM (ray-project#59499) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…Data LLM (ray-project#59499) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

kouroshHakha requested a review from jeffreywang-anyscale December 17, 2025 04:57

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

doc/source/data/doc_code/working-with-llms/basic_llm_example.py Show resolved Hide resolved

doc/source/data/doc_code/working-with-llms/classification_example.py Show resolved Hide resolved

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py Outdated Show resolved Hide resolved

kouroshHakha and others added 3 commits December 16, 2025 20:58

Apply suggestion from @gemini-code-assist[bot]

c9fab18

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com>

wip

ca54a9e

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge branch 'kh/add-score-data-llm' of https://github.com/kouroshHak…

92531e9

…ha/ray into kh/add-score-data-llm Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha added the go add ONLY when ready to merge, run all tests label Dec 17, 2025

kouroshHakha marked this pull request as ready for review December 17, 2025 05:03

kouroshHakha requested review from a team as code owners December 17, 2025 05:03

wip

2b5c6c0

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

jeffreywang-anyscale approved these changes Dec 17, 2025

View reviewed changes

doc/source/data/doc_code/working-with-llms/basic_llm_example.py Show resolved Hide resolved

cursor bot reviewed Dec 17, 2025

View reviewed changes

doc/source/data/doc_code/working-with-llms/basic_llm_example.py Show resolved Hide resolved

ray-gardener bot added data Ray Data-related issues llm labels Dec 17, 2025

Wip

4629fc3

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

richardliaw approved these changes Dec 18, 2025

View reviewed changes

kouroshHakha merged commit d3a9d8c into ray-project:master Dec 18, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data][LLM] Add support for classification and scoring models in Ray Data LLM#59499

[Data][LLM] Add support for classification and scoring models in Ray Data LLM#59499
kouroshHakha merged 6 commits intoray-project:masterfrom
kouroshHakha:kh/add-score-data-llm

kouroshHakha commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffreywang-anyscale left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kouroshHakha commented Dec 17, 2025

Summary

Motivation

Usage

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffreywang-anyscale left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants