Skip to content

[docs][data][llm] Batch inference docs reorg + update to reflect per-stage config refactor#59214

Merged
kouroshHakha merged 16 commits intoray-project:masterfrom
nrghosh:data-llm-docs-config-knobs
Jan 15, 2026
Merged

[docs][data][llm] Batch inference docs reorg + update to reflect per-stage config refactor#59214
kouroshHakha merged 16 commits intoray-project:masterfrom
nrghosh:data-llm-docs-config-knobs

Conversation

@nrghosh
Copy link
Contributor

@nrghosh nrghosh commented Dec 6, 2025

Update / streamline batch inference documentation

  • Include new layout of updated architecture
  • Reflect refactor for batch inference config knobs / etc
  • Simplify user walkthrough, examples, and onboarding for batch inference
  • Misc API spec / comment updates and improvements

New structure for Batch Inference Docs (proposed)

Working with LLMs
=================

1. Quickstart   
2. Architecture                       
3. Common use cases
   - Text generation (merge from "Perform batch inference")
   - Embeddings (keep concise)
   - Vision-language models (simplify)
   - OpenAI-compatible endpoints
4. Troubleshooting                     
   - GPU memory / CUDA OOM
   - Model caching for large clusters
5. Advanced configuration            
   - Per-stage configuration
   - Model parallelism (TP/PP)
   - Cross-node parallelism
   - LoRA inference
   - S3/GCS model loading
   - Serve deployments
6. Usage data collection            

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the documentation for ray.data.llm by updating it to reflect a recent refactoring of per-stage configurations. The changes introduce clearer, stage-based parameters and add valuable new sections explaining the processor architecture and advanced configuration options. The code examples are correctly updated to use the new API. Overall, this is a high-quality documentation update that will greatly benefit users. I've included a couple of minor suggestions to further enhance the clarity and completeness of the documentation.

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@nrghosh nrghosh force-pushed the data-llm-docs-config-knobs branch from a3199de to 80cef64 Compare December 7, 2025 00:40
@nrghosh nrghosh added the go add ONLY when ready to merge, run all tests label Dec 8, 2025
@nrghosh nrghosh changed the title [docs][data][llm] Update batch inference docs for per-stage config refactor [docs][data][llm] Batch inference docs reorg + update to reflect per-stage config refactor Dec 8, 2025
nrghosh and others added 2 commits December 7, 2025 23:35
- Restructure into: Getting Started, Common Use Cases, Troubleshooting, Advanced Config
- Remove redundant 'Perform batch inference' section (duplicated quickstart)
- Promote GPU OOM / model caching to Troubleshooting section
- Consolidate advanced topics (parallelism, per-stage config, LoRA, Serve)
- Simplify VLM and embeddings examples
- Update to new stage config API (prepare_image_stage, etc.)
- Add PIL, RunAI to Vale vocabulary

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@nrghosh nrghosh force-pushed the data-llm-docs-config-knobs branch from b4df643 to bcaeef2 Compare December 8, 2025 18:29
@richardliaw richardliaw added the data Ray Data-related issues label Dec 10, 2025
nrghosh and others added 2 commits December 12, 2025 15:16
Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@nrghosh nrghosh marked this pull request as ready for review December 16, 2025 00:58
@nrghosh nrghosh requested a review from a team as a code owner December 16, 2025 00:58
Comment on lines +67 to +78
.. code-block:: text

Input Dataset
|
v
+------------------+
| Preprocess | (your custom function)
+------------------+
|
v
+------------------+
| PrepareImage | (optional, for VLMs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this into a simple diagram that doesn't take up that much space?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just a bullet list. problem is just this takes a lot of screen real estate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after:
image

--bucket-uri gs://my-bucket/path/to/model

For a complete embedding configuration example, see:
Then reference the remote path in your config:
Copy link
Contributor

@richardliaw richardliaw Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would link out to the runai streamer explicitly for further reading

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between this and below model loading section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could be listed together, but the diff is that one is focused on a commonly encountered error (HF rate limits) and the other is introducing a new optimized solution (runai streamer) - so they are addressing different things. Renaming for clarity + including a direct link to the runai streamer docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after (runai) -
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after - (HF model loading)
image

Comment on lines 274 to 282
Horizontal scaling
~~~~~~~~~~~~~~~~~~

Besides cross-node parallelism, you can horizontally scale the LLM stage to multiple replicas using the ``concurrency`` parameter:

.. literalinclude:: doc_code/working-with-llms/basic_llm_example.py
:language: python
:start-after: __concurrent_config_example_start__
:end-before: __concurrent_config_example_end__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this belongs more in Common rather than Advanced

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common is more for detailing use-cases, I'll add it to 'getting started' - since it applies to all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after:
image

)

.. _faqs:
Available fields for all stages: ``enabled``, ``batch_size``, ``concurrency``, ``runtime_env``, ``num_cpus``, ``memory``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link out to documentation, don't reference fields here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding stage configs to API reference, and replacing inline with direct doc reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after:
image

Copy link
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall this change makes sense.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Dec 30, 2025
@nrghosh nrghosh removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 6, 2026
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
- move horiz scaling section
- add explicit RunAI streamer link

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
- highlight agentic and multi-turn
- add stage configs to api reference
- link to api reference instead of listing fields

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@nrghosh nrghosh force-pushed the data-llm-docs-config-knobs branch from 9b6251a to 0513ec4 Compare January 7, 2026 02:42
@nrghosh nrghosh requested a review from richardliaw January 8, 2026 17:27
=================

The :ref:`ray.data.llm <llm-ref>` module integrates with key large language model (LLM) inference engines and deployed models to enable LLM batch inference.
The :ref:`ray.data.llm <llm-ref>` module integrates with LLM inference engines (vLLM, SGLang) to enable scalable batch inference on Ray Data datasets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOC is already on the right side, so this is fairly redundant. Can you also make the right side reflect proper heading hierarchy? -

Image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a way to do nesting for ToC (right side) or any examples anywhere else in docs, hence the redundancy (just for readability). Followed up with Docs folks

For context this is what Master looks like right now:
image

@jeffreywang-anyscale
Copy link
Contributor

Thanks for updating the doc! It looks much nicer now :)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@nrghosh nrghosh force-pushed the data-llm-docs-config-knobs branch from 0513ec4 to 403f9b3 Compare January 8, 2026 19:31
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@nrghosh nrghosh requested a review from a team January 8, 2026 19:42
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@nrghosh nrghosh requested review from a team and kouroshHakha January 9, 2026 00:28
@nrghosh nrghosh added the llm label Jan 12, 2026
@kouroshHakha kouroshHakha enabled auto-merge (squash) January 15, 2026 17:26
Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamp

@kouroshHakha kouroshHakha merged commit 29af75c into ray-project:master Jan 15, 2026
7 checks passed
jeffery4011 pushed a commit to jeffery4011/ray that referenced this pull request Jan 20, 2026
…stage config refactor (ray-project#59214)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: jeffery4011 <jefferyshen1015@gmail.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Feb 3, 2026
…stage config refactor (ray-project#59214)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…stage config refactor (ray-project#59214)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…stage config refactor (ray-project#59214)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests llm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants