Skip to content

[data][llm] Support configuring HttpRequestUDF resources#60313

Merged
kouroshHakha merged 3 commits intoray-project:masterfrom
yuchen-ecnu:RAY-60311
Jan 29, 2026
Merged

[data][llm] Support configuring HttpRequestUDF resources#60313
kouroshHakha merged 3 commits intoray-project:masterfrom
yuchen-ecnu:RAY-60311

Conversation

@yuchen-ecnu
Copy link
Contributor

@yuchen-ecnu yuchen-ecnu commented Jan 20, 2026

Description

This PR introduces resources configuration options in HttpRequestProcessorConfig to provide more flexible resource control for HttpRequestUDF.

Related issues

Closes #60311

Additional information

It has been tested in python/ray/llm/tests/batch/cpu/stages/test_http_request_stage.py.
Users can use this feature with the following code:

config = HttpRequestProcessorConfig(
    url="http://xxxx/v1/chat/completions",
    headers={"Authorization": "Bearer fake_key"},
    qps=2,
    max_retries=3,
    concurrency=(1, 100),
    http_request_stage=HttpRequestStageConfig(
        num_cpus=0.5,
        memory=100000,
    ),
)

@yuchen-ecnu yuchen-ecnu requested a review from a team as a code owner January 20, 2026 05:33
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a resources configuration option in HttpRequestProcessorConfig to allow more flexible resource control for the HttpRequestUDF. The implementation correctly adds the new field and passes the specified resources to the underlying Ray Data map_batches operator. The changes are accompanied by a test case that verifies the new functionality. The code is clear, correct, and a good addition to provide more control over resource allocation.

@ray-gardener ray-gardener bot added data Ray Data-related issues llm community-contribution Contributed by the community labels Jan 20, 2026
@jeffreywang-anyscale
Copy link
Contributor

jeffreywang-anyscale commented Jan 22, 2026

Hey @yuchen-ecnu thanks for your contribution! Could we follow how other stages (e.g. ChatTemplate, Tokenizer) configure their num_cpus (reference) and not introduce the resources attribute? That is, configure the resources at the stage rather than the processor granularity. Please fix DCO as well (git commit -s -m "your commit message" or git commit -s --amend), thank you! cc: @nrghosh

Copy link
Contributor

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed issue writeup / PR!

Valid use-case / concern, and we do support fractional cpu support - so this is a good request.

Main gap to address as you point out is that HttpRequestProcessorConfig doesn't expose that for the HTTP stage. For this PR (as Jeff pointed out), would want to align with the existing pattern by adding explicit fields, like

class HttpRequestProcessorConfig(ProcessorConfig):                                                           
    num_cpus: Optional[float] = Field(                                                                       
      default=None,                                                                                        
      description="CPUs per HttpRequestUDF worker. Defaults to 1 if None. "                                
      "For I/O-bound workloads, use fractional values (e.g., 0.1).",                                       
    )                                                                                                        
    memory: Optional[float] = Field(default=None, ...)                                                       
                    

This way we keep the API consistent with other stages + leverage the existing extract_resource_kwargs() utility.

Also

  • lint / sign (from Jeff)
  • update test to match above pattern ^

…stProcessorConfig

Signed-off-by: Yu Chen <yuchen.ecnu@gmail.com>
@yuchen-ecnu
Copy link
Contributor Author

Hi @jeffreywang-anyscale @nrghosh , thanks for the detailed comments!

You’re right that we should keep consistency with the existing pattern. I’ve replaced the generic config resource with explicit fields num_cpus and memory, and updated the tests accordingly.
The DCO issue has also been addressed in the latest commit.

Please let me know if there’s anything else I should improve.

Comment on lines +58 to +66
num_cpus: Optional[float] = Field(
default=None,
description="Number of CPUs per HttpRequestUDF worker. Defaults to 1 if None. "
"For I/O-bound workloads, use fractional values (e.g., 0.1).",
)
memory: Optional[float] = Field(
default=None,
description="Heap memory in bytes to reserve for each HttpRequestUDF worker.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction makes sense, but let's do this at the stage layer (not the processor layer) to be consistent with how other stages configure num_cpus and memory although there is only 1 stage in the HttpRequestProcessor. We can take advantage of

def build_cpu_stage_map_kwargs(
and will not need to invoke extract_resource_kwargs directly. You can follow this example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jeffreywang-anyscale , many thanks for the example — it really helped me understand.

I’ve moved the configs to the stage layer using the new HttpRequestStageConfig and added a guard check to prevent users from disabling the only stage in HttpRequestProcessor.

Please let me know if there are any further improvements I should make.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Copy link
Contributor

@jeffreywang-anyscale jeffreywang-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of documentation NITs. Thank you again for the contribution, @yuchen-ecnu!

)
http_request_stage: Any = Field(
default=True,
description="Chat templating stage config (bool | dict | HttpRequestStageConfig).",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, updated.

@jeffreywang-anyscale jeffreywang-anyscale added the go add ONLY when ready to merge, run all tests label Jan 28, 2026
Signed-off-by: Yu Chen <yuchen.ecnu@gmail.com>
@yuchen-ecnu
Copy link
Contributor Author

Couple of documentation NITs. Thank you again for the contribution, @yuchen-ecnu!

Hi @jeffreywang-anyscale , many thanks for your kind and patient review. The related docs have been updated.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang-anyscale
Copy link
Contributor

jeffreywang-anyscale commented Jan 29, 2026

There are some transient failures with Anyscale:

Traceback (most recent call last):
--
File "/workdir/release/ray_release/job_manager/anyscale_job_manager.py", line 85, in _run_job
job_response = self._sdk.create_job(job_request)
File "/usr/local/lib/python3.10/site-packages/anyscale/sdk/anyscale_client/sdk.py", line 552, in create_job
return super().create_job(create_production_job, **kwargs)
File "/usr/local/lib/python3.10/site-packages/anyscale/sdk/anyscale_client/api/default_api.py", line 884, in create_job
return self.create_job_with_http_info(create_production_job, **kwargs)  # noqa: E501
File "/usr/local/lib/python3.10/site-packages/anyscale/sdk/anyscale_client/api/default_api.py", line 963, in create_job_with_http_info
return self.api_client.call_api(
File "/usr/local/lib/python3.10/site-packages/anyscale/sdk/anyscale_client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/usr/local/lib/python3.10/site-packages/anyscale/sdk/anyscale_client/api_client.py", line 180, in __call_api
response_data = self.request(
File "/usr/local/lib/python3.10/site-packages/anyscale/sdk/anyscale_client/api_client.py", line 391, in request
return self.rest_client.POST(url,
File "/usr/local/lib/python3.10/site-packages/anyscale/sdk/anyscale_client/rest.py", line 270, in POST
return self.request("POST", url,
File "/usr/local/lib/python3.10/site-packages/anyscale/sdk/anyscale_client/rest.py", line 229, in request
raise ApiException(http_resp=r)
anyscale_client.exceptions.ApiException: (500)
Reason: Internal Server Error

Retry didn't succeed. Will retry the failed release tests again tomorrow.

@kouroshHakha kouroshHakha enabled auto-merge (squash) January 29, 2026 18:57
Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamp

@kouroshHakha kouroshHakha merged commit 599b347 into ray-project:master Jan 29, 2026
7 checks passed
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Jan 29, 2026
…#60313)

Signed-off-by: Yu Chen <yuchen.ecnu@gmail.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com>
400Ping pushed a commit to 400Ping/ray that referenced this pull request Feb 1, 2026
…#60313)

Signed-off-by: Yu Chen <yuchen.ecnu@gmail.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: 400Ping <jiekaichang@apache.org>
@xiyuecangxin
Copy link

LGTM!

ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…#60313)

Signed-off-by: Yu Chen <yuchen.ecnu@gmail.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…#60313)

Signed-off-by: Yu Chen <yuchen.ecnu@gmail.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…#60313)

Signed-off-by: Yu Chen <yuchen.ecnu@gmail.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests llm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[data][llm] Support configuring HttpRequestUDF resources to fully utilize worker CPU cores

6 participants