Skip to content

document performance flags for serve#57845

Merged
abrarsheikh merged 4 commits intomasterfrom
SERVE-1236-abrar-perf
Oct 18, 2025
Merged

document performance flags for serve#57845
abrarsheikh merged 4 commits intomasterfrom
SERVE-1236-abrar-perf

Conversation

@abrarsheikh
Copy link
Contributor

No description provided.

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the performance tuning documentation by adding a new section on throughput-optimized flags. The changes clearly document several performance-related environment variables, explaining their purpose and how they can be used to improve throughput and latency. The restructuring of the document to separate request path performance issues from controller performance issues is a good improvement for clarity. I've found one minor grammatical issue in the new documentation.

### Enable throughput-optimized flags

:::{note}
In Ray v2.54.0, the defaults for `RAY_SERVE_RUN_USER_CODE_IN_SEPARATE_THREAD` and `RAY_SERVE_RUN_ROUTER_IN_SEPARATE_LOOP` will change to `0` for improved performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but not the logging ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging can be made default after #57850 is implemented

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're making this out in the future 2.54, should we just include the logging one as well then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am inclined to leave it out, because when we implement the time-based logger, we can immediately roll it out without warning to the developer, since it has no perceivable impact.

@ray-gardener ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation labels Oct 17, 2025
Copy link
Contributor

@dstrodtman dstrodtman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akshay-anyscale I believe that this provides sufficient detail without going so far as telling users how they must design their applications.

On the Anyscale side, I'll word this slightly more strongly. While usually I like to avoid anti-pattern code examples, this might be one that (especially for novice users, which is definitely some data scientists that could be our users) an explicit example of what "blocking code" looks like could be helpful.

### Enable throughput-optimized flags

:::{note}
In Ray v2.54.0, the defaults for `RAY_SERVE_RUN_USER_CODE_IN_SEPARATE_THREAD` and `RAY_SERVE_RUN_ROUTER_IN_SEPARATE_LOOP` will change to `0` for improved performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In Ray v2.54.0, the defaults for `RAY_SERVE_RUN_USER_CODE_IN_SEPARATE_THREAD` and `RAY_SERVE_RUN_ROUTER_IN_SEPARATE_LOOP` will change to `0` for improved performance.
A breaking change to this functionality will go live with Ray version 2.54.0. The defaults for `RAY_SERVE_RUN_USER_CODE_IN_SEPARATE_THREAD` and `RAY_SERVE_RUN_ROUTER_IN_SEPARATE_LOOP` will change to `0`, disabling existing default functionality to improve serving throughput.
You should update your code to explicitly set these properties to `1` if your workloads require legacy behavior.

Typically, I avoid mentioning future state. Since this is a known planned migration, we should announce it.

We should also draft customer comms and work with @tg-anyscale to address Anyscale customers. (I understand this is technically opt-in for the breaking change because it's a new Ray version, but still nice to encourage customers to start testing now.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A breaking change to this functionality will go live with

This is not a breaking change from the customer POV, they don't need to take any action to opt into these optimizations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm confused: won't running the user code in the same loop as the serve code break customer workloads with blocking logic once the default changes (assuming upgrade to Ray 2.54.0)?

Or will users not experience a performance degradation relative to now, they just won't see an improvement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will users not experience a performance degradation relative to now, they just won't see an improvement?

This ^

In Ray v2.54.0, the defaults for `RAY_SERVE_RUN_USER_CODE_IN_SEPARATE_THREAD` and `RAY_SERVE_RUN_ROUTER_IN_SEPARATE_LOOP` will change to `0` for improved performance.
:::

Ray Serve offers performance flags that improve throughput and latency. You can enable all optimizations at once with `RAY_SERVE_THROUGHPUT_OPTIMIZED=1`, or configure individual flags:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ray Serve offers performance flags that improve throughput and latency. You can enable all optimizations at once with `RAY_SERVE_THROUGHPUT_OPTIMIZED=1`, or configure individual flags:
This section details how to enable Ray Serve options focused on improving throughput and reducing latency. These configurations focus on the following:
- Reducing overhead associated with frequent logging.
- Disabling behavior that allowed Serve applications to include blocking operations.
If your Ray Serve code includes blocking operations, you must refactor your code to enable enhanced throughput.
To configure all options to the recommended settings, set the environment variable `RAY_SERVE_THROUGHPUT_OPTIMIZED=1`.
You can also configure each option individually. The following table details the recommended configurations and their impact:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming: if I've set RAY_SERVE_THROUGHPUT_OPTIMIZED=1, can I still override individual configs below? Or should I manually configure all 4 if I need a higher/lower buffer size for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can I still override individual configs below?

no, need to unset RAY_SERVE_THROUGHPUT_OPTIMIZED and manually set each one individually.

@abrarsheikh
Copy link
Contributor Author

@akshay-anyscale I believe that this provides sufficient detail without going so far as telling users how they must design their applications.

On the Anyscale side, I'll word this slightly more strongly. While usually I like to avoid anti-pattern code examples, this might be one that (especially for novice users, which is definitely some data scientists that could be our users) an explicit example of what "blocking code" looks like could be helpful.

I decided to add a code example showing blocking and non-blocking operation, let me know what you think.

Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Oct 17, 2025
Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh merged commit 4d5485c into master Oct 18, 2025
6 checks passed
@abrarsheikh abrarsheikh deleted the SERVE-1236-abrar-perf branch October 18, 2025 00:16
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
Signed-off-by: abrar <abrar@anyscale.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
Signed-off-by: abrar <abrar@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs An issue or change related to documentation go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants