Skip to content

[Serve] Add default autoscaling params for custom policies#58857

Merged
abrarsheikh merged 22 commits intoray-project:masterfrom
vaishdho1:custom-autoscaling-default-params
Jan 30, 2026
Merged

[Serve] Add default autoscaling params for custom policies#58857
abrarsheikh merged 22 commits intoray-project:masterfrom
vaishdho1:custom-autoscaling-default-params

Conversation

@vaishdho1
Copy link
Contributor

Description

Currently, custom autoscaling policies bypass all standard autoscaling configuration parameters - they must manually implement delay logic, scaling factors, and bounds checking themselves. This PR adds an apply_autoscaling_config decorator that enables custom autoscaling policies to automatically benefit from Ray Serve's standard autoscaling parameters that are embedded in the default policy (upscale_delay_s, downscale_delay_s, downscale_to_zero_delay_s, upscaling_factor, downscaling_factor, min_replicas, max_replicas).

Related issues

Fixes #58622

Implementation Details

  • Core implementation (python/ray/serve/autoscaling_policy.py):
    • Added apply_autoscaling_config decorator
    • Refactored delay logic into _apply_delay_logic() helper function
    • Added scaling factor logic for custom policies_apply_scaling_factors() helper function
    • Refactored bounds checking into _apply_bounds() helper function
    • Updated replica_queue_length_autoscaling_policy to use _apply_delay_logic function
  • Tests (python/ray/serve/tests/test_autoscaling_policy.py and python/ray/tests/unit/test_autoscaling_policy.py)
    • End-to-end tests verifying delay enforcement for decorated custom policies
    • Tests for scaling factor moderation (upscaling and downscaling)
    • Unit tests for checking each helper function

Added documentation for usage with example

…s and doc changes

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
@vaishdho1 vaishdho1 requested review from a team as code owners November 20, 2025 18:57
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a very useful @apply_autoscaling_config decorator to simplify custom autoscaling policies by applying standard configuration parameters automatically. The implementation is well-structured, refactoring existing logic into reusable helper functions. The addition of comprehensive unit and end-to-end tests ensures the new functionality is robust and covers various scenarios, including delays, scaling factors, and edge cases like scaling to zero. The documentation is also updated clearly.

My review includes a critical suggestion to remove leftover development notes from the code and improve handling of a reserved key in the policy state. I've also included a few medium-severity suggestions to improve documentation clarity and test coverage.

@ray-gardener ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation community-contribution Contributed by the community labels Nov 20, 2025
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
@github-actions
Copy link

github-actions bot commented Dec 9, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Dec 9, 2025
abrarsheikh pushed a commit that referenced this pull request Dec 10, 2025
…ing (#59118)

## Description
Application level autoscaling policies in Ray Serve have no mechanism to
persist policy state between control-loop iterations, preventing
stateful autoscaling behavior. Additionally, per-deployment internal
state needed for applying standard autoscaling config parameters over
custom policies cannot be maintained.(#58622)

This PR adds the following:
- The user policy state should return a `Dict[DeploymentID,Dict]`.  
- Each deployment's `AutoscalingContext` now receives the user state
returned by the custom policy per deployment ID enabling policies to
maintain their state across iterations. Each deployment gets its own
state.


## Related issues
Fixes: #59008 
Related to:  #58622 #58857

## Additional information
The implementation modifies the
`ApplicationAutoscalingState.get_decision_num_replicas()` method to
implement:
- User returned policy state validation
- Returning the policy state back into each deployment

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
@abrarsheikh
Copy link
Contributor

@vaishdho1 is this now blocked?

@vaishdho1
Copy link
Contributor Author

Yes this is unblocked. I am writing a few tests. Will update this PR today.

@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Dec 12, 2025
… tests and refactored code

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
@vaishdho1 vaishdho1 changed the title [Serve] Add @apply_autoscaling_config decorator for custom autoscaling policies [Serve] Add default autoscaling params for custom policies Dec 12, 2025
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

target_num_requests = config.get_target_ongoing_requests() * num_running_replicas
error_ratio = ctx.total_num_requests / target_num_requests
desired_num_replicas = num_running_replicas * error_ratio
return desired_num_replicas, {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Division by zero when policy called with zero replicas

Medium Severity

The _core_replica_queue_length_policy function performs ctx.total_num_requests / target_num_requests where target_num_requests equals config.get_target_ongoing_requests() * num_running_replicas. When current_num_replicas is 0, this causes a ZeroDivisionError. The public function replica_queue_length_autoscaling_policy and default_autoscaling_policy directly call this core function without protection. While internal usage is protected by the apply_autoscaling_config decorator's cold-start path, direct calls to these exported functions will crash.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will not be calling this function directly. In the current flow it is always passed through apply_autoscaling_config decorator.
In tests where this policy is explicitly tested, it is passed through the same decorator.
Should we explicitly mention this or have a check to prevent access to this function directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its a private function, then prefix it with _

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but its a good idea to guard this function from division by zero without assuming anything about how it get's called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return 0 when target_num_requests == 0 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this. We should guard num_running_replicas from becoming zero

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

@abrarsheikh
Copy link
Contributor

CI failing with


[2026-01-27T07:13:04Z] !!! No API stability annotation found for:
--
[2026-01-27T07:13:04Z] ray.serve.autoscaling_policy.apply_app_level_autoscaling_config
[2026-01-27T07:13:04Z] ray.serve.autoscaling_policy.apply_autoscaling_config
[2026-01-27T07:13:04Z] ray.serve.autoscaling_policy.apply_default_params
[2026-01-27T07:13:04Z] ray.serve.autoscaling_policy.replica_queue_length_autoscaling_policy

except for replica_queue_length_autoscaling_policy all others should be private, i think.

…and expose public API for replica queue length policy

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

… enhance safety checks

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

cold_start_replicas,
state_per_deployment[dep_id],
)
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

App-level policy state discarded during cold start

Medium Severity

In _apply_app_level_autoscaling_config, when a deployment is cold-starting, the wrapper uses state_per_deployment[dep_id] (the original state) instead of merging with updated_custom_policy_state.get(dep_id, {}). The user's policy IS called first (line 240), and may return state updates for cold-starting deployments, but those updates are silently discarded. This differs from single-deployment behavior where the policy isn't called during cold start, so there's no state to lose. Stateful app-level policies tracking counters or other state during cold start will experience unexpected state loss.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here, unlike a single deployment case some deployments can have 0 replicas with num_requests>0 and some may not.
Currently, I pass everything through the policy, overwrite those deployments which have 0 replicase and num_requests>0 with cold start path. In this process we will loose user state of those overwritten deployments.
The other alternative is to first apply cold start path to those deployments which start with 0 replicas and num_requests>0 and send the other ones through the policy, this will exactly mimic the deployment level behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not quite following this, but I think its okay to take it as a follow up.

Copy link
Contributor

@abrarsheikh abrarsheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits

cold_start_replicas,
state_per_deployment[dep_id],
)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not quite following this, but I think its okay to take it as a follow up.

target_num_requests = config.get_target_ongoing_requests() * num_running_replicas
error_ratio = ctx.total_num_requests / target_num_requests
desired_num_replicas = num_running_replicas * error_ratio
return desired_num_replicas, {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return 0 when target_num_requests == 0 ?

@vaishdho1
Copy link
Contributor Author

vaishdho1 commented Jan 28, 2026

Yes, the original default policy has the same functionality with target_num_requests=0.
For the application autoscaling case, I will keep the cold_start path implementation unchanged. I can create a follow up to look into this separately.

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
@vaishdho1
Copy link
Contributor Author

Modified cold_start_path for app level autoscaling to preserve user state. Will pick this up as a follow up for further enhancement.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

SERVE_AUTOSCALING_DECISION_COUNTERS_KEY: policy_state.get(
SERVE_AUTOSCALING_DECISION_COUNTERS_KEY, 0
)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated internal policy state extraction logic

Low Severity

The code block that extracts the internal policy state (creating an internal_policy_state dict with SERVE_AUTOSCALING_DECISION_COUNTERS_KEY) is duplicated verbatim in both _apply_default_params_and_merge_state and _merge_user_state_with_internal_state. This extraction logic could be refactored into a small helper function to eliminate the duplication and reduce maintenance burden.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, let's address this as follow up

desired_num_replicas,
ctx.capacity_adjusted_min_replicas,
ctx.capacity_adjusted_max_replicas,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bounds applied twice creating redundant computation

Low Severity

Bounds checking is applied twice in the autoscaling flow: first via _apply_bounds inside the _apply_default_params function (called by the decorator), and then again via self.apply_bounds in get_decision_num_replicas. While applying bounds twice is idempotent and doesn't cause incorrect behavior, it creates redundant computation and makes the code flow confusing to understand.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's ticket this and address as a follow up

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

final_state[dep_id] = _merge_user_state_with_internal_state(
state_per_deployment[dep_id],
custom_policy_state_per_deployment,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared state dict mutation corrupts delay counters

Medium Severity

In _apply_app_level_autoscaling_config, when processing deployments, the functions _merge_user_state_with_internal_state and _apply_default_params_and_merge_state mutate the user_policy_state dict in place via .update(). If an app-level policy returns the same dict object for multiple deployments' state (e.g., {d1: shared_dict, d2: shared_dict}), each deployment's internal delay counter (SERVE_AUTOSCALING_DECISION_COUNTERS_KEY) overwrites the previous one. This causes the delay logic to malfunction because earlier deployments lose their counter state.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a potential issue.
#58857 (comment)
This logic was finalized as part of this discussion where we decided on directly updating the user state.

But if the user actually uses the same state for multiple deployments in application level policies, the delay counters can get corrupted. I think we need to either create a copy of the user state or return a new dict as the merged state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's take that up in a follow up PR. Thanks for staying on top @vaishdho1

@abrarsheikh abrarsheikh merged commit c2ec21e into ray-project:master Jan 30, 2026
6 checks passed
liulehui pushed a commit to liulehui/ray that referenced this pull request Jan 31, 2026
…ct#58857)

## Description
Currently, custom autoscaling policies bypass all standard autoscaling
configuration parameters - they must manually implement delay logic,
scaling factors, and bounds checking themselves. This PR adds an
`apply_autoscaling_config` decorator that enables custom autoscaling
policies to automatically benefit from Ray Serve's standard autoscaling
parameters that are embedded in the default policy (`upscale_delay_s,`
`downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`,
`downscaling_factor`, `min_replicas`, `max_replicas`).
## Related issues
Fixes ray-project#58622 

## Implementation Details
- Core implementation (python/ray/serve/autoscaling_policy.py):
  - Added `apply_autoscaling_config decorator`
  - Refactored delay logic into `_apply_delay_logic()` helper function
- Added scaling factor logic for custom
policies`_apply_scaling_factors()` helper function
  - Refactored bounds checking into` _apply_bounds()` helper function
- Updated replica_queue_length_autoscaling_policy to use
`_apply_delay_logic` function
- Tests (python/ray/serve/tests/test_autoscaling_policy.py and
python/ray/tests/unit/test_autoscaling_policy.py)
- End-to-end tests verifying delay enforcement for decorated custom
policies
   - Tests for scaling factor moderation (upscaling and downscaling)
   - Unit tests for checking each helper function
 
Added documentation for usage with example

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Co-authored-by: harshit-anyscale <harshit@anyscale.com>
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Jan 31, 2026
…ct#58857)

## Description
Currently, custom autoscaling policies bypass all standard autoscaling
configuration parameters - they must manually implement delay logic,
scaling factors, and bounds checking themselves. This PR adds an
`apply_autoscaling_config` decorator that enables custom autoscaling
policies to automatically benefit from Ray Serve's standard autoscaling
parameters that are embedded in the default policy (`upscale_delay_s,`
`downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`,
`downscaling_factor`, `min_replicas`, `max_replicas`).
## Related issues
Fixes ray-project#58622 

## Implementation Details
- Core implementation (python/ray/serve/autoscaling_policy.py):
  - Added `apply_autoscaling_config decorator`
  - Refactored delay logic into `_apply_delay_logic()` helper function
- Added scaling factor logic for custom
policies`_apply_scaling_factors()` helper function
  - Refactored bounds checking into` _apply_bounds()` helper function
- Updated replica_queue_length_autoscaling_policy to use
`_apply_delay_logic` function
- Tests (python/ray/serve/tests/test_autoscaling_policy.py and
python/ray/tests/unit/test_autoscaling_policy.py)
- End-to-end tests verifying delay enforcement for decorated custom
policies
   - Tests for scaling factor moderation (upscaling and downscaling)
   - Unit tests for checking each helper function
 
Added documentation for usage with example

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Co-authored-by: harshit-anyscale <harshit@anyscale.com>
400Ping pushed a commit to 400Ping/ray that referenced this pull request Feb 1, 2026
…ct#58857)

## Description
Currently, custom autoscaling policies bypass all standard autoscaling
configuration parameters - they must manually implement delay logic,
scaling factors, and bounds checking themselves. This PR adds an
`apply_autoscaling_config` decorator that enables custom autoscaling
policies to automatically benefit from Ray Serve's standard autoscaling
parameters that are embedded in the default policy (`upscale_delay_s,`
`downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`,
`downscaling_factor`, `min_replicas`, `max_replicas`).
## Related issues
Fixes ray-project#58622

## Implementation Details
- Core implementation (python/ray/serve/autoscaling_policy.py):
  - Added `apply_autoscaling_config decorator`
  - Refactored delay logic into `_apply_delay_logic()` helper function
- Added scaling factor logic for custom
policies`_apply_scaling_factors()` helper function
  - Refactored bounds checking into` _apply_bounds()` helper function
- Updated replica_queue_length_autoscaling_policy to use
`_apply_delay_logic` function
- Tests (python/ray/serve/tests/test_autoscaling_policy.py and
python/ray/tests/unit/test_autoscaling_policy.py)
- End-to-end tests verifying delay enforcement for decorated custom
policies
   - Tests for scaling factor moderation (upscaling and downscaling)
   - Unit tests for checking each helper function

Added documentation for usage with example

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Co-authored-by: harshit-anyscale <harshit@anyscale.com>
Signed-off-by: 400Ping <jiekaichang@apache.org>
rayhhome pushed a commit to rayhhome/ray that referenced this pull request Feb 4, 2026
…ct#58857)

## Description
Currently, custom autoscaling policies bypass all standard autoscaling
configuration parameters - they must manually implement delay logic,
scaling factors, and bounds checking themselves. This PR adds an
`apply_autoscaling_config` decorator that enables custom autoscaling
policies to automatically benefit from Ray Serve's standard autoscaling
parameters that are embedded in the default policy (`upscale_delay_s,`
`downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`,
`downscaling_factor`, `min_replicas`, `max_replicas`).
## Related issues
Fixes ray-project#58622

## Implementation Details
- Core implementation (python/ray/serve/autoscaling_policy.py):
  - Added `apply_autoscaling_config decorator`
  - Refactored delay logic into `_apply_delay_logic()` helper function
- Added scaling factor logic for custom
policies`_apply_scaling_factors()` helper function
  - Refactored bounds checking into` _apply_bounds()` helper function
- Updated replica_queue_length_autoscaling_policy to use
`_apply_delay_logic` function
- Tests (python/ray/serve/tests/test_autoscaling_policy.py and
python/ray/tests/unit/test_autoscaling_policy.py)
- End-to-end tests verifying delay enforcement for decorated custom
policies
   - Tests for scaling factor moderation (upscaling and downscaling)
   - Unit tests for checking each helper function

Added documentation for usage with example

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Co-authored-by: harshit-anyscale <harshit@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
## Description
Currently, custom autoscaling policies bypass all standard autoscaling
configuration parameters - they must manually implement delay logic,
scaling factors, and bounds checking themselves. This PR adds an
`apply_autoscaling_config` decorator that enables custom autoscaling
policies to automatically benefit from Ray Serve's standard autoscaling
parameters that are embedded in the default policy (`upscale_delay_s,`
`downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`,
`downscaling_factor`, `min_replicas`, `max_replicas`).
## Related issues
Fixes #58622 

## Implementation Details
- Core implementation (python/ray/serve/autoscaling_policy.py):
  - Added `apply_autoscaling_config decorator`
  - Refactored delay logic into `_apply_delay_logic()` helper function
- Added scaling factor logic for custom
policies`_apply_scaling_factors()` helper function
  - Refactored bounds checking into` _apply_bounds()` helper function
- Updated replica_queue_length_autoscaling_policy to use
`_apply_delay_logic` function
- Tests (python/ray/serve/tests/test_autoscaling_policy.py and
python/ray/tests/unit/test_autoscaling_policy.py)
- End-to-end tests verifying delay enforcement for decorated custom
policies
   - Tests for scaling factor moderation (upscaling and downscaling)
   - Unit tests for checking each helper function
 
Added documentation for usage with example

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Co-authored-by: harshit-anyscale <harshit@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
## Description
Currently, custom autoscaling policies bypass all standard autoscaling
configuration parameters - they must manually implement delay logic,
scaling factors, and bounds checking themselves. This PR adds an
`apply_autoscaling_config` decorator that enables custom autoscaling
policies to automatically benefit from Ray Serve's standard autoscaling
parameters that are embedded in the default policy (`upscale_delay_s,`
`downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`,
`downscaling_factor`, `min_replicas`, `max_replicas`).
## Related issues
Fixes #58622 

## Implementation Details
- Core implementation (python/ray/serve/autoscaling_policy.py):
  - Added `apply_autoscaling_config decorator`
  - Refactored delay logic into `_apply_delay_logic()` helper function
- Added scaling factor logic for custom
policies`_apply_scaling_factors()` helper function
  - Refactored bounds checking into` _apply_bounds()` helper function
- Updated replica_queue_length_autoscaling_policy to use
`_apply_delay_logic` function
- Tests (python/ray/serve/tests/test_autoscaling_policy.py and
python/ray/tests/unit/test_autoscaling_policy.py)
- End-to-end tests verifying delay enforcement for decorated custom
policies
   - Tests for scaling factor moderation (upscaling and downscaling)
   - Unit tests for checking each helper function
 
Added documentation for usage with example

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Co-authored-by: harshit-anyscale <harshit@anyscale.com>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…ct#58857)

## Description
Currently, custom autoscaling policies bypass all standard autoscaling
configuration parameters - they must manually implement delay logic,
scaling factors, and bounds checking themselves. This PR adds an
`apply_autoscaling_config` decorator that enables custom autoscaling
policies to automatically benefit from Ray Serve's standard autoscaling
parameters that are embedded in the default policy (`upscale_delay_s,`
`downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`,
`downscaling_factor`, `min_replicas`, `max_replicas`).
## Related issues
Fixes ray-project#58622

## Implementation Details
- Core implementation (python/ray/serve/autoscaling_policy.py):
  - Added `apply_autoscaling_config decorator`
  - Refactored delay logic into `_apply_delay_logic()` helper function
- Added scaling factor logic for custom
policies`_apply_scaling_factors()` helper function
  - Refactored bounds checking into` _apply_bounds()` helper function
- Updated replica_queue_length_autoscaling_policy to use
`_apply_delay_logic` function
- Tests (python/ray/serve/tests/test_autoscaling_policy.py and
python/ray/tests/unit/test_autoscaling_policy.py)
- End-to-end tests verifying delay enforcement for decorated custom
policies
   - Tests for scaling factor moderation (upscaling and downscaling)
   - Unit tests for checking each helper function

Added documentation for usage with example

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Co-authored-by: harshit-anyscale <harshit@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ing (ray-project#59118)

## Description
Application level autoscaling policies in Ray Serve have no mechanism to
persist policy state between control-loop iterations, preventing
stateful autoscaling behavior. Additionally, per-deployment internal
state needed for applying standard autoscaling config parameters over
custom policies cannot be maintained.(ray-project#58622)

This PR adds the following:
- The user policy state should return a `Dict[DeploymentID,Dict]`.
- Each deployment's `AutoscalingContext` now receives the user state
returned by the custom policy per deployment ID enabling policies to
maintain their state across iterations. Each deployment gets its own
state.

## Related issues
Fixes: ray-project#59008
Related to:  ray-project#58622 ray-project#58857

## Additional information
The implementation modifies the
`ApplicationAutoscalingState.get_decision_num_replicas()` method to
implement:
- User returned policy state validation
- Returning the policy state back into each deployment

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ct#58857)

## Description
Currently, custom autoscaling policies bypass all standard autoscaling
configuration parameters - they must manually implement delay logic,
scaling factors, and bounds checking themselves. This PR adds an
`apply_autoscaling_config` decorator that enables custom autoscaling
policies to automatically benefit from Ray Serve's standard autoscaling
parameters that are embedded in the default policy (`upscale_delay_s,`
`downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`,
`downscaling_factor`, `min_replicas`, `max_replicas`).
## Related issues
Fixes ray-project#58622

## Implementation Details
- Core implementation (python/ray/serve/autoscaling_policy.py):
  - Added `apply_autoscaling_config decorator`
  - Refactored delay logic into `_apply_delay_logic()` helper function
- Added scaling factor logic for custom
policies`_apply_scaling_factors()` helper function
  - Refactored bounds checking into` _apply_bounds()` helper function
- Updated replica_queue_length_autoscaling_policy to use
`_apply_delay_logic` function
- Tests (python/ray/serve/tests/test_autoscaling_policy.py and
python/ray/tests/unit/test_autoscaling_policy.py)
- End-to-end tests verifying delay enforcement for decorated custom
policies
   - Tests for scaling factor moderation (upscaling and downscaling)
   - Unit tests for checking each helper function

Added documentation for usage with example

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Co-authored-by: harshit-anyscale <harshit@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ct#58857)

## Description
Currently, custom autoscaling policies bypass all standard autoscaling
configuration parameters - they must manually implement delay logic,
scaling factors, and bounds checking themselves. This PR adds an
`apply_autoscaling_config` decorator that enables custom autoscaling
policies to automatically benefit from Ray Serve's standard autoscaling
parameters that are embedded in the default policy (`upscale_delay_s,`
`downscale_delay_s`, `downscale_to_zero_delay_s`, `upscaling_factor`,
`downscaling_factor`, `min_replicas`, `max_replicas`).
## Related issues
Fixes ray-project#58622

## Implementation Details
- Core implementation (python/ray/serve/autoscaling_policy.py):
  - Added `apply_autoscaling_config decorator`
  - Refactored delay logic into `_apply_delay_logic()` helper function
- Added scaling factor logic for custom
policies`_apply_scaling_factors()` helper function
  - Refactored bounds checking into` _apply_bounds()` helper function
- Updated replica_queue_length_autoscaling_policy to use
`_apply_delay_logic` function
- Tests (python/ray/serve/tests/test_autoscaling_policy.py and
python/ray/tests/unit/test_autoscaling_policy.py)
- End-to-end tests verifying delay enforcement for decorated custom
policies
   - Tests for scaling factor moderation (upscaling and downscaling)
   - Unit tests for checking each helper function

Added documentation for usage with example

---------

Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>
Co-authored-by: harshit-anyscale <harshit@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community docs An issue or change related to documentation go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Serve] Custom autoscaling policies don't benefit from standard autoscaling config parameters

4 participants