-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Description
Description
Follow-up to #59212. The current continue_on_error implementation only handles the case where the vLLM engine runs in the same lifecycle as the Ray Data pipeline (vLLMEngineStage). When using Ray Serve handles (ServeDeploymentStage), error handling requires a separate implementation due to differences in error propagation.
Background
ServeDeploymentStage accesses the LLM engine via DeploymentHandle (RPC calls) rather than in-process. This changes how errors show up (from the POV of a Ray Data user):
| Error Type | vLLMEngineStage (in-process) | ServeDeploymentStage (RPC) |
|---|---|---|
| Prompt too long | ValueError |
Wrapped in RayTaskError |
| Engine OOM | EngineDeadError |
RayActorError (replica died) |
| Network issue | N/A | RayActorError, timeout |
| Replica crashed | N/A | RayActorError |
Current State
serve_deployment_stage.py has the same vulnerability as the original vllm_engine_stage.py:
tasks = [asyncio.create_task(self.generate_async(row)) for row in batch]
for resp in asyncio.as_completed(tasks):
request, output, time_taken = await resp # Exception propagates, kills batchProposed Implementation
-
Add
should_continue_on_errorparameter toServeDeploymentStageUDF.__init__ -
Create
_generate_with_error_handlingwrapper forgenerate_async -
Define fatal vs non-fatal errors for serve handle case:
_SERVE_FATAL_ERRORS = ( ray.exceptions.RayActorError, # Replica crashed # Connection/timeout errors TBD )
-
Handle error unwrapping -
RayTaskErrorwraps the original exception, need to inspectcauseto determine if underlying error was fatal -
Wire
continue_on_errorthroughServeDeploymentProcessorConfig
Challenges
- Error serialization: vLLM exception types (
EngineDeadError) are wrapped/serialized over RPC, not directly catchable - Fatal error detection: Need to distinguish "replica died" (fatal, don't continue) from "request validation failed" (non-fatal, safe to continue)
- Error unwrapping: May need to parse error messages or inspect
RayTaskError.causeto determine root cause - Serve middleware: Errors might be converted to HTTP responses rather than exceptions depending on deployment configuration
Files to Modify
python/ray/llm/_internal/batch/stages/serve_deployment_stage.py- Add error handling wrapperpython/ray/llm/_internal/batch/processor/serve_deployment_proc.py- Wire config throughpython/ray/llm/tests/batch/gpu/stages/test_serve_deployment_stage.py- Add tests
Related
- Parent PR: [Data][LLM] Add should_continue_on_error for graceful error handling in batch inference #59212
- Issue: [Data] Custom error handling for failed rows in dataset processing #52449
Use case
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status