[Data][LLM] Add `should_continue_on_error` support for ServeDeploymentStage (Data <> Serve)

### Description

Follow-up to #59212. The current `continue_on_error` implementation only handles the case where the vLLM engine runs in the same lifecycle as the Ray Data pipeline (`vLLMEngineStage`). When using Ray Serve handles (`ServeDeploymentStage`), error handling requires a separate implementation due to differences in error propagation.

### Background

`ServeDeploymentStage` accesses the LLM engine via `DeploymentHandle` (RPC calls) rather than in-process. This changes how errors show up (from the POV of a Ray Data user):

| Error Type | vLLMEngineStage (in-process) | ServeDeploymentStage (RPC) |
|------------|------------------------------|----------------------------|
| Prompt too long | `ValueError` | Wrapped in `RayTaskError` |
| Engine OOM | `EngineDeadError` | `RayActorError` (replica died) |
| Network issue | N/A | `RayActorError`, timeout |
| Replica crashed | N/A | `RayActorError` |

### Current State

`serve_deployment_stage.py` has the same vulnerability as the original `vllm_engine_stage.py`:

```python
tasks = [asyncio.create_task(self.generate_async(row)) for row in batch]

for resp in asyncio.as_completed(tasks):
    request, output, time_taken = await resp  # Exception propagates, kills batch
```

### Proposed Implementation

1. Add `should_continue_on_error` parameter to `ServeDeploymentStageUDF.__init__`

2. Create `_generate_with_error_handling` wrapper for `generate_async`

3. Define fatal vs non-fatal errors for serve handle case:
   ```python
   _SERVE_FATAL_ERRORS = (
       ray.exceptions.RayActorError,  # Replica crashed
       # Connection/timeout errors TBD
   )
   ```

4. Handle error unwrapping - `RayTaskError` wraps the original exception, need to inspect `cause` to determine if underlying error was fatal

5. Wire `continue_on_error` through `ServeDeploymentProcessorConfig`

### Challenges

- **Error serialization**: vLLM exception types (`EngineDeadError`) are wrapped/serialized over RPC, not directly catchable
- **Fatal error detection**: Need to distinguish "replica died" (fatal, don't continue) from "request validation failed" (non-fatal, safe to continue)
- **Error unwrapping**: May need to parse error messages or inspect `RayTaskError.cause` to determine root cause
- **Serve middleware**: Errors might be converted to HTTP responses rather than exceptions depending on deployment configuration

### Files to Modify

- `python/ray/llm/_internal/batch/stages/serve_deployment_stage.py` - Add error handling wrapper
- `python/ray/llm/_internal/batch/processor/serve_deployment_proc.py` - Wire config through
- `python/ray/llm/tests/batch/gpu/stages/test_serve_deployment_stage.py` - Add tests

### Related

- Parent PR: #59212
- Issue: #52449

### Use case

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data][LLM] Add `should_continue_on_error` support for ServeDeploymentStage (Data <> Serve) #59325

Description

Background

Current State

Proposed Implementation

Challenges

Files to Modify

Related

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error Type	vLLMEngineStage (in-process)	ServeDeploymentStage (RPC)
Prompt too long	`ValueError`	Wrapped in `RayTaskError`
Engine OOM	`EngineDeadError`	`RayActorError` (replica died)
Network issue	N/A	`RayActorError`, timeout
Replica crashed	N/A	`RayActorError`

[Data][LLM] Add should_continue_on_error support for ServeDeploymentStage (Data <> Serve) #59325

Description

Description

Background

Current State

Proposed Implementation

Challenges

Files to Modify

Related

Use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Data][LLM] Add `should_continue_on_error` support for ServeDeploymentStage (Data <> Serve) #59325