[Serve] Add default autoscaling params for custom policies#58857
[Serve] Add default autoscaling params for custom policies#58857abrarsheikh merged 22 commits intoray-project:masterfrom
Conversation
…s and doc changes Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a very useful @apply_autoscaling_config decorator to simplify custom autoscaling policies by applying standard configuration parameters automatically. The implementation is well-structured, refactoring existing logic into reusable helper functions. The addition of comprehensive unit and end-to-end tests ensures the new functionality is robust and covers various scenarios, including delays, scaling factors, and edge cases like scaling to zero. The documentation is also updated clearly.
My review includes a critical suggestion to remove leftover development notes from the code and improve handling of a reserved key in the policy state. I've also included a few medium-severity suggestions to improve documentation clarity and test coverage.
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
…ing (#59118) ## Description Application level autoscaling policies in Ray Serve have no mechanism to persist policy state between control-loop iterations, preventing stateful autoscaling behavior. Additionally, per-deployment internal state needed for applying standard autoscaling config parameters over custom policies cannot be maintained.(#58622) This PR adds the following: - The user policy state should return a `Dict[DeploymentID,Dict]`. - Each deployment's `AutoscalingContext` now receives the user state returned by the custom policy per deployment ID enabling policies to maintain their state across iterations. Each deployment gets its own state. ## Related issues Fixes: #59008 Related to: #58622 #58857 ## Additional information The implementation modifies the `ApplicationAutoscalingState.get_decision_num_replicas()` method to implement: - User returned policy state validation - Returning the policy state back into each deployment --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
|
@vaishdho1 is this now blocked? |
|
Yes this is unblocked. I am writing a few tests. Will update this PR today. |
… tests and refactored code Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
| target_num_requests = config.get_target_ongoing_requests() * num_running_replicas | ||
| error_ratio = ctx.total_num_requests / target_num_requests | ||
| desired_num_replicas = num_running_replicas * error_ratio | ||
| return desired_num_replicas, {} |
There was a problem hiding this comment.
Division by zero when policy called with zero replicas
Medium Severity
The _core_replica_queue_length_policy function performs ctx.total_num_requests / target_num_requests where target_num_requests equals config.get_target_ongoing_requests() * num_running_replicas. When current_num_replicas is 0, this causes a ZeroDivisionError. The public function replica_queue_length_autoscaling_policy and default_autoscaling_policy directly call this core function without protection. While internal usage is protected by the apply_autoscaling_config decorator's cold-start path, direct calls to these exported functions will crash.
Additional Locations (1)
There was a problem hiding this comment.
We will not be calling this function directly. In the current flow it is always passed through apply_autoscaling_config decorator.
In tests where this policy is explicitly tested, it is passed through the same decorator.
Should we explicitly mention this or have a check to prevent access to this function directly?
There was a problem hiding this comment.
if its a private function, then prefix it with _
There was a problem hiding this comment.
but its a good idea to guard this function from division by zero without assuming anything about how it get's called.
There was a problem hiding this comment.
should we return 0 when target_num_requests == 0 ?
There was a problem hiding this comment.
Fixed this. We should guard num_running_replicas from becoming zero
|
CI failing with except for |
…and expose public API for replica queue length policy Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
… enhance safety checks Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
| cold_start_replicas, | ||
| state_per_deployment[dep_id], | ||
| ) | ||
| continue |
There was a problem hiding this comment.
App-level policy state discarded during cold start
Medium Severity
In _apply_app_level_autoscaling_config, when a deployment is cold-starting, the wrapper uses state_per_deployment[dep_id] (the original state) instead of merging with updated_custom_policy_state.get(dep_id, {}). The user's policy IS called first (line 240), and may return state updates for cold-starting deployments, but those updates are silently discarded. This differs from single-deployment behavior where the policy isn't called during cold start, so there's no state to lose. Stateful app-level policies tracking counters or other state during cold start will experience unexpected state loss.
There was a problem hiding this comment.
So here, unlike a single deployment case some deployments can have 0 replicas with num_requests>0 and some may not.
Currently, I pass everything through the policy, overwrite those deployments which have 0 replicase and num_requests>0 with cold start path. In this process we will loose user state of those overwritten deployments.
The other alternative is to first apply cold start path to those deployments which start with 0 replicas and num_requests>0 and send the other ones through the policy, this will exactly mimic the deployment level behavior.
There was a problem hiding this comment.
i am not quite following this, but I think its okay to take it as a follow up.
| cold_start_replicas, | ||
| state_per_deployment[dep_id], | ||
| ) | ||
| continue |
There was a problem hiding this comment.
i am not quite following this, but I think its okay to take it as a follow up.
| target_num_requests = config.get_target_ongoing_requests() * num_running_replicas | ||
| error_ratio = ctx.total_num_requests / target_num_requests | ||
| desired_num_replicas = num_running_replicas * error_ratio | ||
| return desired_num_replicas, {} |
There was a problem hiding this comment.
should we return 0 when target_num_requests == 0 ?
|
Yes, the original default policy has the same functionality with |
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
|
Modified |
| SERVE_AUTOSCALING_DECISION_COUNTERS_KEY: policy_state.get( | ||
| SERVE_AUTOSCALING_DECISION_COUNTERS_KEY, 0 | ||
| ) | ||
| } |
There was a problem hiding this comment.
Duplicated internal policy state extraction logic
Low Severity
The code block that extracts the internal policy state (creating an internal_policy_state dict with SERVE_AUTOSCALING_DECISION_COUNTERS_KEY) is duplicated verbatim in both _apply_default_params_and_merge_state and _merge_user_state_with_internal_state. This extraction logic could be refactored into a small helper function to eliminate the duplication and reduce maintenance burden.
Additional Locations (1)
There was a problem hiding this comment.
same here, let's address this as follow up
| desired_num_replicas, | ||
| ctx.capacity_adjusted_min_replicas, | ||
| ctx.capacity_adjusted_max_replicas, | ||
| ) |
There was a problem hiding this comment.
Bounds applied twice creating redundant computation
Low Severity
Bounds checking is applied twice in the autoscaling flow: first via _apply_bounds inside the _apply_default_params function (called by the decorator), and then again via self.apply_bounds in get_decision_num_replicas. While applying bounds twice is idempotent and doesn't cause incorrect behavior, it creates redundant computation and makes the code flow confusing to understand.
Additional Locations (1)
There was a problem hiding this comment.
let's ticket this and address as a follow up
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
| final_state[dep_id] = _merge_user_state_with_internal_state( | ||
| state_per_deployment[dep_id], | ||
| custom_policy_state_per_deployment, | ||
| ) |
There was a problem hiding this comment.
Shared state dict mutation corrupts delay counters
Medium Severity
In _apply_app_level_autoscaling_config, when processing deployments, the functions _merge_user_state_with_internal_state and _apply_default_params_and_merge_state mutate the user_policy_state dict in place via .update(). If an app-level policy returns the same dict object for multiple deployments' state (e.g., {d1: shared_dict, d2: shared_dict}), each deployment's internal delay counter (SERVE_AUTOSCALING_DECISION_COUNTERS_KEY) overwrites the previous one. This causes the delay logic to malfunction because earlier deployments lose their counter state.
Additional Locations (1)
There was a problem hiding this comment.
This looks like a potential issue.
#58857 (comment)
This logic was finalized as part of this discussion where we decided on directly updating the user state.
But if the user actually uses the same state for multiple deployments in application level policies, the delay counters can get corrupted. I think we need to either create a copy of the user state or return a new dict as the merged state.
There was a problem hiding this comment.
let's take that up in a follow up PR. Thanks for staying on top @vaishdho1
…ct#58857) ## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes ray-project#58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com>
…ct#58857) ## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes ray-project#58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com>
…ct#58857) ## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes ray-project#58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com> Signed-off-by: 400Ping <jiekaichang@apache.org>
…ct#58857) ## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes ray-project#58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com> Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes #58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes #58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com>
…ct#58857) ## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes ray-project#58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com> Signed-off-by: Adel Nour <ans9868@nyu.edu>
…ing (ray-project#59118) ## Description Application level autoscaling policies in Ray Serve have no mechanism to persist policy state between control-loop iterations, preventing stateful autoscaling behavior. Additionally, per-deployment internal state needed for applying standard autoscaling config parameters over custom policies cannot be maintained.(ray-project#58622) This PR adds the following: - The user policy state should return a `Dict[DeploymentID,Dict]`. - Each deployment's `AutoscalingContext` now receives the user state returned by the custom policy per deployment ID enabling policies to maintain their state across iterations. Each deployment gets its own state. ## Related issues Fixes: ray-project#59008 Related to: ray-project#58622 ray-project#58857 ## Additional information The implementation modifies the `ApplicationAutoscalingState.get_decision_num_replicas()` method to implement: - User returned policy state validation - Returning the policy state back into each deployment --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
…ct#58857) ## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes ray-project#58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
…ct#58857) ## Description Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an `apply_autoscaling_config` decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (`upscale_delay_s,` `downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`, `downscaling_factor`, `min_replicas`, `max_replicas`). ## Related issues Fixes ray-project#58622 ## Implementation Details - Core implementation (python/ray/serve/autoscaling_policy.py): - Added `apply_autoscaling_config decorator` - Refactored delay logic into `_apply_delay_logic()` helper function - Added scaling factor logic for custom policies`_apply_scaling_factors()` helper function - Refactored bounds checking into` _apply_bounds()` helper function - Updated replica_queue_length_autoscaling_policy to use `_apply_delay_logic` function - Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py) - End-to-end tests verifying delay enforcement for decorated custom policies - Tests for scaling factor moderation (upscaling and downscaling) - Unit tests for checking each helper function Added documentation for usage with example --------- Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com> Co-authored-by: harshit-anyscale <harshit@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>


Description
Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an
apply_autoscaling_configdecorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (upscale_delay_s,downscale_delay_s,downscale_to_zero_delay_s,upscaling_factor,downscaling_factor,min_replicas,max_replicas).Related issues
Fixes #58622
Implementation Details
apply_autoscaling_config decorator_apply_delay_logic()helper function_apply_scaling_factors()helper function_apply_bounds()helper function_apply_delay_logicfunctionAdded documentation for usage with example