[docs] Batch inference for embedding models in ray data by nrghosh · Pull Request #56027 · ray-project/ray

nrghosh · 2025-08-28T00:50:15Z

Why are these changes needed?

Adds documentation and example for embedding batch inference

Key configs: task_type="embed", apply_chat_template=False, detokenize=False

Simple validation example (reproduction script included below)

Related issue number

#55384

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Reproduction of exact test

#!/usr/bin/env python3
"""
Simple embedding test that closely follows the working test pattern from test_vllm_engine_proc.py.
This version uses the exact same pattern as the confirmed working test.
"""

import ray
from ray.llm._internal.batch.processor import ProcessorBuilder
from ray.llm._internal.batch.processor.vllm_engine_proc import vLLMEngineProcessorConfig


def test_simple_embedding():
    """
    Simple embedding test based on the working test_embedding_model pattern.
    Uses the same configuration pattern as the confirmed working test.
    """
    print("Testing simple embedding functionality...")
    
    # Initialize Ray
    if not ray.is_initialized():
        ray.init()
    
    try:
        # Configuration pattern from working test
        processor_config = vLLMEngineProcessorConfig(
            model_source="sentence-transformers/all-MiniLM-L6-v2",  # Small, accessible model
            task_type="embed",  # Critical: set to embed for embeddings
            engine_kwargs=dict(
                enable_prefix_caching=False,
                enable_chunked_prefill=False,
                max_model_len=256,  # Reduced to avoid model length warnings
                enforce_eager=True,  # Skip CUDA graph capturing
            ),
            batch_size=32,  # Increased to avoid batch size warnings
            concurrency=1,
            apply_chat_template=False,  # Simpler - no chat template
            detokenize=False,  # Embeddings don't need detokenization
        )

        # Build processor using the same pattern as working test
        processor = ProcessorBuilder.build(
            processor_config,
            preprocess=lambda row: dict(
                prompt=row["text"],  # Direct prompt input
            ),
            postprocess=lambda row: {
                "original_text": row["prompt"],
                "embedding": row["embeddings"],  # Extract embeddings
            },
        )

        # Create test dataset (same pattern as working test)
        test_data = [
            "Hello world",
            "This is a test sentence",
            "Embedding models convert text to vectors",
            "Ray Data enables scalable ML processing",
        ]
        ds = ray.data.from_items([{"text": text} for text in test_data])
        
        # Process through embedding pipeline
        print(f"Processing {len(test_data)} texts through embedding model...")
        ds = processor(ds)
        ds = ds.materialize()
        results = ds.take_all()

        # Validation (same pattern as working test)
        assert len(results) == len(test_data), f"Expected {len(test_data)} results, got {len(results)}"
        assert all("original_text" in result for result in results), "Missing original_text in results"
        assert all("embedding" in result for result in results), "Missing embedding in results"
        assert all(result["embedding"] is not None for result in results), "Found null embeddings"

        print(f"Successfully processed {len(results)} texts!")
        
        # Show results
        for i, result in enumerate(results):
            embedding = result["embedding"]
            print(f"Text {i+1}: {result['original_text']}")
            if isinstance(embedding, list):
                print(f"  Embedding dimension: {len(embedding)}")
                print(f"  Sample values: {embedding[:3] if len(embedding) >= 3 else embedding}")
            else:
                print(f"  Embedding type: {type(embedding)}")
            print()

        print("Simple embedding test completed successfully!")
        return True
        
    except Exception as e:
        print(f"Error in simple embedding test: {e}")
        import traceback
        traceback.print_exc()
        return False
    
    finally:
        if ray.is_initialized():
            ray.shutdown()


if __name__ == "__main__":
    print("Simple Embedding Test (Based on Working Test Pattern)")
    print("=" * 55)
    
    success = test_simple_embedding()
    
    if success:
        print("\nSimple embedding test passed!")
    else:
        print("\nSimple embedding test failed!")
        exit(1)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

kouroshHakha

TY.

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

…56027) Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

…56027) Signed-off-by: Gang Zhao <gang@gang-JQ62HD2C37.local>

…56027) Signed-off-by: sampan <sampan@anyscale.com>

…56027) Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>

…56027) Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>

Signed-off-by: Douglas Strodtman <douglas@anyscale.com>

…56027)

[docs] Batch inference for embedding models in ray data

f9f1dae

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh requested a review from angelinalg August 28, 2025 00:50

nrghosh self-assigned this Aug 28, 2025

nrghosh added docs An issue or change related to documentation data Ray Data-related issues llm labels Aug 28, 2025

kouroshHakha approved these changes Aug 28, 2025

View reviewed changes

nrghosh marked this pull request as ready for review August 28, 2025 00:56

nrghosh requested a review from a team as a code owner August 28, 2025 00:56

nrghosh added the go add ONLY when ready to merge, run all tests label Aug 28, 2025

doc lint

6fccbca

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

richardliaw approved these changes Aug 28, 2025

View reviewed changes

richardliaw merged commit 2ff6909 into ray-project:master Aug 28, 2025
5 checks passed

tohtana pushed a commit to tohtana/ray that referenced this pull request Aug 29, 2025

[docs] Batch inference for embedding models in ray data (ray-project#…

2d2e5cc

…56027) Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

tohtana pushed a commit to tohtana/ray that referenced this pull request Aug 29, 2025

[docs] Batch inference for embedding models in ray data (ray-project#…

53e39ec

…56027) Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

gangsf pushed a commit to gangsf/ray that referenced this pull request Sep 2, 2025

[docs] Batch inference for embedding models in ray data (ray-project#…

aa225a1

…56027) Signed-off-by: Gang Zhao <gang@gang-JQ62HD2C37.local>

sampan-s-nayak pushed a commit to sampan-s-nayak/ray that referenced this pull request Sep 8, 2025

[docs] Batch inference for embedding models in ray data (ray-project#…

0f92c15

…56027) Signed-off-by: sampan <sampan@anyscale.com>

jugalshah291 pushed a commit to jugalshah291/ray_fork that referenced this pull request Sep 11, 2025

[docs] Batch inference for embedding models in ray data (ray-project#…

c2d17d3

…56027) Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>

wyhong3103 pushed a commit to wyhong3103/ray that referenced this pull request Sep 12, 2025

[docs] Batch inference for embedding models in ray data (ray-project#…

11ba284

…56027) Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>

dstrodtman pushed a commit that referenced this pull request Oct 6, 2025

[docs] Batch inference for embedding models in ray data (#56027)

875f0b8

Signed-off-by: Douglas Strodtman <douglas@anyscale.com>

landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025

[docs] Batch inference for embedding models in ray data (ray-project#…

0eaf9f6

…56027)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Batch inference for embedding models in ray data#56027

[docs] Batch inference for embedding models in ray data#56027
richardliaw merged 2 commits intoray-project:masterfrom
nrghosh:nrghosh/data-llm-embeddings

nrghosh commented Aug 28, 2025 •

edited

Loading

Uh oh!

kouroshHakha left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nrghosh commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Reproduction of exact test

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nrghosh commented Aug 28, 2025 •

edited

Loading