[Serve] Add default autoscaling params for custom policies by vaishdho1 · Pull Request #58857 · ray-project/ray

vaishdho1 · 2025-11-20T18:57:56Z

Description

Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an apply_autoscaling_config decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (upscale_delay_s, downscale_delay_s, downscale_to_zero_delay_s, upscaling_factor, downscaling_factor, min_replicas, max_replicas).

Related issues

Fixes #58622

Implementation Details

Core implementation (python/ray/serve/autoscaling_policy.py):
- Added apply_autoscaling_config decorator
- Refactored delay logic into _apply_delay_logic() helper function
- Added scaling factor logic for custom policies_apply_scaling_factors() helper function
- Refactored bounds checking into _apply_bounds() helper function
- Updated replica_queue_length_autoscaling_policy to use _apply_delay_logic function
Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py)
- End-to-end tests verifying delay enforcement for decorated custom policies
- Tests for scaling factor moderation (upscaling and downscaling)
- Unit tests for checking each helper function

Added documentation for usage with example

…s and doc changes Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

python/ray/serve/autoscaling_policy.py

gemini-code-assist

Code Review

This pull request introduces a very useful @apply_autoscaling_config decorator to simplify custom autoscaling policies by applying standard configuration parameters automatically. The implementation is well-structured, refactoring existing logic into reusable helper functions. The addition of comprehensive unit and end-to-end tests ensures the new functionality is robust and covers various scenarios, including delays, scaling factors, and edge cases like scaling to zero. The documentation is also updated clearly.

My review includes a critical suggestion to remove leftover development notes from the code and improve handling of a reserved key in the policy state. I've also included a few medium-severity suggestions to improve documentation clarity and test coverage.

python/ray/serve/autoscaling_policy.py

doc/source/serve/advanced-guides/advanced-autoscaling.md

python/ray/serve/tests/test_autoscaling_policy.py

python/ray/serve/tests/unit/test_autoscaling_policy.py

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

python/ray/serve/autoscaling_policy.py

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

python/ray/serve/autoscaling_policy.py

github-actions · 2025-12-09T00:40:07Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

…ing (#59118) ## Description Application level autoscaling policies in Ray Serve have no mechanism to persist policy state between control-loop iterations, preventing stateful autoscaling behavior. Additionally, per-deployment internal state needed for applying standard autoscaling config parameters over custom policies cannot be maintained.(#58622) This PR adds the following: - The user policy state should return a `Dict[DeploymentID,Dict]`. - Each deployment's `AutoscalingContext` now receives the user state returned by the custom policy per deployment ID enabling policies to maintain their state across iterations. Each deployment gets its own state. ## Related issues Fixes: #59008 Related to: #58622 #58857 ## Additional information The implementation modifies the `ApplicationAutoscalingState.get_decision_num_replicas()` method to implement: - User returned policy state validation - Returning the policy state back into each deployment --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

abrarsheikh · 2025-12-11T16:22:17Z

@vaishdho1 is this now blocked?

vaishdho1 · 2025-12-11T17:15:25Z

Yes this is unblocked. I am writing a few tests. Will update this PR today.

… tests and refactored code Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

python/ray/serve/autoscaling_policy.py

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

python/ray/serve/autoscaling_policy.py

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

cursor · 2026-01-26T21:20:16Z

python/ray/serve/autoscaling_policy.py

+    target_num_requests = config.get_target_ongoing_requests() * num_running_replicas
+    error_ratio = ctx.total_num_requests / target_num_requests
+    desired_num_replicas = num_running_replicas * error_ratio
+    return desired_num_replicas, {}


Division by zero when policy called with zero replicas

Medium Severity

The _core_replica_queue_length_policy function performs ctx.total_num_requests / target_num_requests where target_num_requests equals config.get_target_ongoing_requests() * num_running_replicas. When current_num_replicas is 0, this causes a ZeroDivisionError. The public function replica_queue_length_autoscaling_policy and default_autoscaling_policy directly call this core function without protection. While internal usage is protected by the apply_autoscaling_config decorator's cold-start path, direct calls to these exported functions will crash.

Additional Locations (1)

python/ray/serve/autoscaling_policy.py#L278-L291

We will not be calling this function directly. In the current flow it is always passed through apply_autoscaling_config decorator.
In tests where this policy is explicitly tested, it is passed through the same decorator.
Should we explicitly mention this or have a check to prevent access to this function directly?

if its a private function, then prefix it with _

but its a good idea to guard this function from division by zero without assuming anything about how it get's called.

should we return 0 when target_num_requests == 0 ?

Fixed this. We should guard num_running_replicas from becoming zero

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

python/ray/serve/autoscaling_policy.py

abrarsheikh · 2026-01-27T07:38:44Z

CI failing with


[2026-01-27T07:13:04Z] !!! No API stability annotation found for:
--
[2026-01-27T07:13:04Z] ray.serve.autoscaling_policy.apply_app_level_autoscaling_config
[2026-01-27T07:13:04Z] ray.serve.autoscaling_policy.apply_autoscaling_config
[2026-01-27T07:13:04Z] ray.serve.autoscaling_policy.apply_default_params
[2026-01-27T07:13:04Z] ray.serve.autoscaling_policy.replica_queue_length_autoscaling_policy

except for replica_queue_length_autoscaling_policy all others should be private, i think.

…and expose public API for replica queue length policy Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

python/ray/serve/autoscaling_policy.py

… enhance safety checks Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

python/ray/serve/autoscaling_policy.py

cursor · 2026-01-27T19:36:17Z

python/ray/serve/autoscaling_policy.py

+                    cold_start_replicas,
+                    state_per_deployment[dep_id],
+                )
+                continue


App-level policy state discarded during cold start

Medium Severity

In _apply_app_level_autoscaling_config, when a deployment is cold-starting, the wrapper uses state_per_deployment[dep_id] (the original state) instead of merging with updated_custom_policy_state.get(dep_id, {}). The user's policy IS called first (line 240), and may return state updates for cold-starting deployments, but those updates are silently discarded. This differs from single-deployment behavior where the policy isn't called during cold start, so there's no state to lose. Stateful app-level policies tracking counters or other state during cold start will experience unexpected state loss.

So here, unlike a single deployment case some deployments can have 0 replicas with num_requests>0 and some may not.
Currently, I pass everything through the policy, overwrite those deployments which have 0 replicase and num_requests>0 with cold start path. In this process we will loose user state of those overwritten deployments.
The other alternative is to first apply cold start path to those deployments which start with 0 replicas and num_requests>0 and send the other ones through the policy, this will exactly mimic the deployment level behavior.

i am not quite following this, but I think its okay to take it as a follow up.

abrarsheikh

some nits

doc/source/serve/doc_code/autoscaling_policy.py

abrarsheikh · 2026-01-28T06:58:49Z

python/ray/serve/autoscaling_policy.py

+                    cold_start_replicas,
+                    state_per_deployment[dep_id],
+                )
+                continue


i am not quite following this, but I think its okay to take it as a follow up.

abrarsheikh · 2026-01-28T07:00:26Z

python/ray/serve/autoscaling_policy.py

+    target_num_requests = config.get_target_ongoing_requests() * num_running_replicas
+    error_ratio = ctx.total_num_requests / target_num_requests
+    desired_num_replicas = num_running_replicas * error_ratio
+    return desired_num_replicas, {}


should we return 0 when target_num_requests == 0 ?

vaishdho1 · 2026-01-28T19:50:57Z

Yes, the original default policy has the same functionality with target_num_requests=0.
For the application autoscaling case, I will keep the cold_start path implementation unchanged. I can create a follow up to look into this separately.

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

python/ray/serve/autoscaling_policy.py

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

vaishdho1 · 2026-01-29T20:18:54Z

Modified cold_start_path for app level autoscaling to preserve user state. Will pick this up as a follow up for further enhancement.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

cursor · 2026-01-29T20:33:45Z

python/ray/serve/autoscaling_policy.py

+        SERVE_AUTOSCALING_DECISION_COUNTERS_KEY: policy_state.get(
+            SERVE_AUTOSCALING_DECISION_COUNTERS_KEY, 0
+        )
+    }


Duplicated internal policy state extraction logic

Low Severity

The code block that extracts the internal policy state (creating an internal_policy_state dict with SERVE_AUTOSCALING_DECISION_COUNTERS_KEY) is duplicated verbatim in both _apply_default_params_and_merge_state and _merge_user_state_with_internal_state. This extraction logic could be refactored into a small helper function to eliminate the duplication and reduce maintenance burden.

Additional Locations (1)

python/ray/serve/autoscaling_policy.py#L173-L179

same here, let's address this as follow up

cursor · 2026-01-29T20:33:46Z

python/ray/serve/autoscaling_policy.py

+        desired_num_replicas,
+        ctx.capacity_adjusted_min_replicas,
+        ctx.capacity_adjusted_max_replicas,
+    )


Bounds applied twice creating redundant computation

Low Severity

Bounds checking is applied twice in the autoscaling flow: first via _apply_bounds inside the _apply_default_params function (called by the decorator), and then again via self.apply_bounds in get_decision_num_replicas. While applying bounds twice is idempotent and doesn't cause incorrect behavior, it creates redundant computation and makes the code flow confusing to understand.

Additional Locations (1)

python/ray/serve/_private/autoscaling_state.py#L330-L331

let's ticket this and address as a follow up

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

cursor · 2026-01-30T18:08:38Z

python/ray/serve/autoscaling_policy.py

+                final_state[dep_id] = _merge_user_state_with_internal_state(
+                    state_per_deployment[dep_id],
+                    custom_policy_state_per_deployment,
+                )


Shared state dict mutation corrupts delay counters

Medium Severity

In _apply_app_level_autoscaling_config, when processing deployments, the functions _merge_user_state_with_internal_state and _apply_default_params_and_merge_state mutate the user_policy_state dict in place via .update(). If an app-level policy returns the same dict object for multiple deployments' state (e.g., {d1: shared_dict, d2: shared_dict}), each deployment's internal delay counter (SERVE_AUTOSCALING_DECISION_COUNTERS_KEY) overwrites the previous one. This causes the delay logic to malfunction because earlier deployments lose their counter state.

Additional Locations (1)

python/ray/serve/autoscaling_policy.py#L159-L163

This looks like a potential issue.
#58857 (comment)
This logic was finalized as part of this discussion where we decided on directly updating the user state.

But if the user actually uses the same state for multiple deployments in application level policies, the delay counters can get corrupted. I think we need to either create a copy of the user state or return a new dict as the merged state.

let's take that up in a follow up PR. Thanks for staying on top @vaishdho1

python/ray/serve/autoscaling_policy.py

…ct#58857) ## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes ray-project#58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com>

…ct#58857) ## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes ray-project#58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com> Signed-off-by: 400Ping <jiekaichang@apache.org>

…ct#58857) ## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes ray-project#58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com> Signed-off-by: Sirui Huang <ray.huang@anyscale.com>

## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes #58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes #58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com>

…ct#58857) ## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes ray-project#58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com> Signed-off-by: Adel Nour <ans9868@nyu.edu>

…ing (ray-project#59118) ## Description Application level autoscaling policies in Ray Serve have no mechanism to persist policy state between control-loop iterations, preventing stateful autoscaling behavior. Additionally, per-deployment internal state needed for applying standard autoscaling config parameters over custom policies cannot be maintained.(ray-project#58622) This PR adds the following: - The user policy state should return a `Dict[DeploymentID,Dict]`. - Each deployment's `AutoscalingContext` now receives the user state returned by the custom policy per deployment ID enabling policies to maintain their state across iterations. Each deployment gets its own state. ## Related issues Fixes: ray-project#59008 Related to: ray-project#58622 ray-project#58857 ## Additional information The implementation modifies the `ApplicationAutoscalingState.get_decision_num_replicas()` method to implement: - User returned policy state validation - Returning the policy state back into each deployment --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

…ct#58857) ## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes ray-project#58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

[serve] Add apply_autoscaling_config decorator and corresponding test…

a5ab658

…s and doc changes Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

vaishdho1 requested review from a team as code owners November 20, 2025 18:57

cursor bot reviewed Nov 20, 2025

View reviewed changes

python/ray/serve/autoscaling_policy.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Nov 20, 2025

View reviewed changes

ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation community-contribution Contributed by the community labels Nov 20, 2025

[serve] Fixed comments

2c5c959

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor bot reviewed Nov 20, 2025

View reviewed changes

python/ray/serve/autoscaling_policy.py Outdated Show resolved Hide resolved

[serve] Removed unintended parameterization

3c88b86

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

abrarsheikh reviewed Nov 23, 2025

View reviewed changes

python/ray/serve/autoscaling_policy.py Outdated Show resolved Hide resolved

vaishdho1 commented Nov 24, 2025

View reviewed changes

python/ray/serve/autoscaling_policy.py Outdated Show resolved Hide resolved

This was referenced Nov 26, 2025

[Serve]Application Level Autoscaling Context State Management #59008

Closed

[serve] Added policy state persistence for application level autoscaling #59118

Merged

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Dec 9, 2025

vaishdho1 added 2 commits December 10, 2025 14:33

[serve] Added app level autoscaling default params

e3b13b7

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

[serve] Merged upstream

c0ca11c

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Dec 12, 2025

[serve] Added default params support for app-level autoscaling. Added…

fc713c3

… tests and refactored code Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor bot reviewed Dec 12, 2025

View reviewed changes

python/ray/serve/autoscaling_policy.py Show resolved Hide resolved

[serve] Corrected @publicapi

2205c37

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor bot reviewed Dec 12, 2025

View reviewed changes

python/ray/serve/autoscaling_policy.py Outdated Show resolved Hide resolved

[serve] Updating scaling factor check

f201fc0

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

vaishdho1 changed the title ~~[Serve] Add @apply_autoscaling_config decorator for custom autoscaling policies~~ [Serve] Add default autoscaling params for custom policies Dec 12, 2025

cursor bot reviewed Jan 26, 2026

View reviewed changes

Merge branch 'master' into custom-autoscaling-default-params

62e9014

cursor bot reviewed Jan 27, 2026

View reviewed changes

python/ray/serve/autoscaling_policy.py Show resolved Hide resolved

[serve] Rename autoscaling configuration functions to internal scope …

7e08568

…and expose public API for replica queue length policy Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor bot reviewed Jan 27, 2026

View reviewed changes

python/ray/serve/autoscaling_policy.py Show resolved Hide resolved

[serve] Add replica queue length autoscaling policy to public API and…

8db23fc

… enhance safety checks Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor bot reviewed Jan 27, 2026

View reviewed changes

abrarsheikh reviewed Jan 28, 2026

View reviewed changes

[serve] Cleaned up code and added a note for app_level_autoscaling

e8bfd19

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor bot reviewed Jan 28, 2026

View reviewed changes

python/ray/serve/autoscaling_policy.py Show resolved Hide resolved

[serve] Modified app_level cold start path to preserve user state

9bcac99

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor bot reviewed Jan 29, 2026

View reviewed changes

[serve] Add validation to prevent zero replicas in autoscaling policy

3745410

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

cursor bot reviewed Jan 30, 2026

View reviewed changes

vaishdho1 mentioned this pull request Jan 30, 2026

[Serve] Autoscaling: Remove duplicate bounds (follow up to #58857) #60613

Open

abrarsheikh approved these changes Jan 30, 2026

View reviewed changes

abrarsheikh merged commit c2ec21e into ray-project:master Jan 30, 2026
6 checks passed

Conversation

vaishdho1 commented Nov 20, 2025

Description

Related issues

Implementation Details

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 9, 2025

Uh oh!

abrarsheikh commented Dec 11, 2025

Uh oh!

vaishdho1 commented Dec 11, 2025

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 26, 2026

Choose a reason for hiding this comment

Division by zero when policy called with zero replicas

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abrarsheikh commented Jan 27, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot Jan 27, 2026

Choose a reason for hiding this comment

App-level policy state discarded during cold start

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abrarsheikh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vaishdho1 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

vaishdho1 commented Jan 28, 2026 •

edited

Loading