Skip to content

[docs] Batch inference for embedding models in ray data#56027

Merged
richardliaw merged 2 commits intoray-project:masterfrom
nrghosh:nrghosh/data-llm-embeddings
Aug 28, 2025
Merged

[docs] Batch inference for embedding models in ray data#56027
richardliaw merged 2 commits intoray-project:masterfrom
nrghosh:nrghosh/data-llm-embeddings

Conversation

@nrghosh
Copy link
Contributor

@nrghosh nrghosh commented Aug 28, 2025

Why are these changes needed?

Adds documentation and example for embedding batch inference

Key configs: task_type="embed", apply_chat_template=False, detokenize=False

Simple validation example (reproduction script included below)

Related issue number

#55384

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Reproduction of exact test

#!/usr/bin/env python3
"""
Simple embedding test that closely follows the working test pattern from test_vllm_engine_proc.py.
This version uses the exact same pattern as the confirmed working test.
"""

import ray
from ray.llm._internal.batch.processor import ProcessorBuilder
from ray.llm._internal.batch.processor.vllm_engine_proc import vLLMEngineProcessorConfig


def test_simple_embedding():
    """
    Simple embedding test based on the working test_embedding_model pattern.
    Uses the same configuration pattern as the confirmed working test.
    """
    print("Testing simple embedding functionality...")
    
    # Initialize Ray
    if not ray.is_initialized():
        ray.init()
    
    try:
        # Configuration pattern from working test
        processor_config = vLLMEngineProcessorConfig(
            model_source="sentence-transformers/all-MiniLM-L6-v2",  # Small, accessible model
            task_type="embed",  # Critical: set to embed for embeddings
            engine_kwargs=dict(
                enable_prefix_caching=False,
                enable_chunked_prefill=False,
                max_model_len=256,  # Reduced to avoid model length warnings
                enforce_eager=True,  # Skip CUDA graph capturing
            ),
            batch_size=32,  # Increased to avoid batch size warnings
            concurrency=1,
            apply_chat_template=False,  # Simpler - no chat template
            detokenize=False,  # Embeddings don't need detokenization
        )

        # Build processor using the same pattern as working test
        processor = ProcessorBuilder.build(
            processor_config,
            preprocess=lambda row: dict(
                prompt=row["text"],  # Direct prompt input
            ),
            postprocess=lambda row: {
                "original_text": row["prompt"],
                "embedding": row["embeddings"],  # Extract embeddings
            },
        )

        # Create test dataset (same pattern as working test)
        test_data = [
            "Hello world",
            "This is a test sentence",
            "Embedding models convert text to vectors",
            "Ray Data enables scalable ML processing",
        ]
        ds = ray.data.from_items([{"text": text} for text in test_data])
        
        # Process through embedding pipeline
        print(f"Processing {len(test_data)} texts through embedding model...")
        ds = processor(ds)
        ds = ds.materialize()
        results = ds.take_all()

        # Validation (same pattern as working test)
        assert len(results) == len(test_data), f"Expected {len(test_data)} results, got {len(results)}"
        assert all("original_text" in result for result in results), "Missing original_text in results"
        assert all("embedding" in result for result in results), "Missing embedding in results"
        assert all(result["embedding"] is not None for result in results), "Found null embeddings"

        print(f"Successfully processed {len(results)} texts!")
        
        # Show results
        for i, result in enumerate(results):
            embedding = result["embedding"]
            print(f"Text {i+1}: {result['original_text']}")
            if isinstance(embedding, list):
                print(f"  Embedding dimension: {len(embedding)}")
                print(f"  Sample values: {embedding[:3] if len(embedding) >= 3 else embedding}")
            else:
                print(f"  Embedding type: {type(embedding)}")
            print()

        print("Simple embedding test completed successfully!")
        return True
        
    except Exception as e:
        print(f"Error in simple embedding test: {e}")
        import traceback
        traceback.print_exc()
        return False
    
    finally:
        if ray.is_initialized():
            ray.shutdown()


if __name__ == "__main__":
    print("Simple Embedding Test (Based on Working Test Pattern)")
    print("=" * 55)
    
    success = test_simple_embedding()
    
    if success:
        print("\nSimple embedding test passed!")
    else:
        print("\nSimple embedding test failed!")
        exit(1)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@nrghosh nrghosh requested a review from angelinalg August 28, 2025 00:50
@nrghosh nrghosh self-assigned this Aug 28, 2025
@nrghosh nrghosh added docs An issue or change related to documentation data Ray Data-related issues llm labels Aug 28, 2025
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY.

@nrghosh nrghosh marked this pull request as ready for review August 28, 2025 00:56
@nrghosh nrghosh requested a review from a team as a code owner August 28, 2025 00:56
@nrghosh nrghosh added the go add ONLY when ready to merge, run all tests label Aug 28, 2025
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@richardliaw richardliaw merged commit 2ff6909 into ray-project:master Aug 28, 2025
5 checks passed
tohtana pushed a commit to tohtana/ray that referenced this pull request Aug 29, 2025
…56027)

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
tohtana pushed a commit to tohtana/ray that referenced this pull request Aug 29, 2025
…56027)

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
gangsf pushed a commit to gangsf/ray that referenced this pull request Sep 2, 2025
…56027)

Signed-off-by: Gang Zhao <gang@gang-JQ62HD2C37.local>
sampan-s-nayak pushed a commit to sampan-s-nayak/ray that referenced this pull request Sep 8, 2025
jugalshah291 pushed a commit to jugalshah291/ray_fork that referenced this pull request Sep 11, 2025
…56027)

Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
wyhong3103 pushed a commit to wyhong3103/ray that referenced this pull request Sep 12, 2025
…56027)

Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues docs An issue or change related to documentation go add ONLY when ready to merge, run all tests llm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants