[docs][data][llm] Batch inference docs reorg + update to reflect per-stage config refactor#59214
Conversation
There was a problem hiding this comment.
Code Review
This pull request significantly improves the documentation for ray.data.llm by updating it to reflect a recent refactoring of per-stage configurations. The changes introduce clearer, stage-based parameters and add valuable new sections explaining the processor architecture and advanced configuration options. The code examples are correctly updated to use the new API. Overall, this is a high-quality documentation update that will greatly benefit users. I've included a couple of minor suggestions to further enhance the clarity and completeness of the documentation.
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
a3199de to
80cef64
Compare
- Restructure into: Getting Started, Common Use Cases, Troubleshooting, Advanced Config - Remove redundant 'Perform batch inference' section (duplicated quickstart) - Promote GPU OOM / model caching to Troubleshooting section - Consolidate advanced topics (parallelism, per-stage config, LoRA, Serve) - Simplify VLM and embeddings examples - Update to new stage config API (prepare_image_stage, etc.) - Add PIL, RunAI to Vale vocabulary Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
b4df643 to
bcaeef2
Compare
Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>
| .. code-block:: text | ||
|
|
||
| Input Dataset | ||
| | | ||
| v | ||
| +------------------+ | ||
| | Preprocess | (your custom function) | ||
| +------------------+ | ||
| | | ||
| v | ||
| +------------------+ | ||
| | PrepareImage | (optional, for VLMs) |
There was a problem hiding this comment.
Can you make this into a simple diagram that doesn't take up that much space?
There was a problem hiding this comment.
or just a bullet list. problem is just this takes a lot of screen real estate.
| --bucket-uri gs://my-bucket/path/to/model | ||
|
|
||
| For a complete embedding configuration example, see: | ||
| Then reference the remote path in your config: |
There was a problem hiding this comment.
i would link out to the runai streamer explicitly for further reading
There was a problem hiding this comment.
what's the difference between this and below model loading section?
There was a problem hiding this comment.
They could be listed together, but the diff is that one is focused on a commonly encountered error (HF rate limits) and the other is introducing a new optimized solution (runai streamer) - so they are addressing different things. Renaming for clarity + including a direct link to the runai streamer docs.
| Horizontal scaling | ||
| ~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| Besides cross-node parallelism, you can horizontally scale the LLM stage to multiple replicas using the ``concurrency`` parameter: | ||
|
|
||
| .. literalinclude:: doc_code/working-with-llms/basic_llm_example.py | ||
| :language: python | ||
| :start-after: __concurrent_config_example_start__ | ||
| :end-before: __concurrent_config_example_end__ |
There was a problem hiding this comment.
this belongs more in Common rather than Advanced
There was a problem hiding this comment.
Common is more for detailing use-cases, I'll add it to 'getting started' - since it applies to all.
| ) | ||
|
|
||
| .. _faqs: | ||
| Available fields for all stages: ``enabled``, ``batch_size``, ``concurrency``, ``runtime_env``, ``num_cpus``, ``memory``. |
There was a problem hiding this comment.
link out to documentation, don't reference fields here
There was a problem hiding this comment.
Adding stage configs to API reference, and replacing inline with direct doc reference.
richardliaw
left a comment
There was a problem hiding this comment.
overall this change makes sense.
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
- move horiz scaling section - add explicit RunAI streamer link Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
- highlight agentic and multi-turn - add stage configs to api reference - link to api reference instead of listing fields Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
9b6251a to
0513ec4
Compare
| ================= | ||
|
|
||
| The :ref:`ray.data.llm <llm-ref>` module integrates with key large language model (LLM) inference engines and deployed models to enable LLM batch inference. | ||
| The :ref:`ray.data.llm <llm-ref>` module integrates with LLM inference engines (vLLM, SGLang) to enable scalable batch inference on Ray Data datasets. |
|
Thanks for updating the doc! It looks much nicer now :) |
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
0513ec4 to
403f9b3
Compare
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
…stage config refactor (ray-project#59214) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com> Signed-off-by: jeffery4011 <jefferyshen1015@gmail.com>
…stage config refactor (ray-project#59214) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>
…stage config refactor (ray-project#59214) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
…stage config refactor (ray-project#59214) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com> Signed-off-by: peterxcli <peterxcli@gmail.com>







Update / streamline batch inference documentation
New structure for Batch Inference Docs (proposed)