[rollout] feat: enhancing global load balancer (issue #5442) by EricWuji · Pull Request #6059 · verl-project/verl

EricWuji · 2026-04-19T11:51:59Z

Summary

This PR is Task 2 (Phase 2) for the routing roadmap ([#5442](#5442)), building on the global GlobalRequestLoadBalancer introduced in Phase 1 ([#5399](#5399)).

Phase 2 adds pluggable load-balance strategies, optional group-level sticky routing, and rollout-config wiring (strategy name, weights, random seed, KV-cache–aware routing hooks) so routing policy can be selected and tuned without forking core agent-loop code.

What does this PR do?

Introduces a small strategy registry (load_balance.py): least_requests, least_kv_cache, weighted_rr, random, with create_load_balance_strategy / register_load_balance_strategy for extension.
Extends GlobalRequestLoadBalancer to delegate new-request picks to the selected strategy while preserving request-level stickiness; adds optional group_sticky_routing and group_id on acquire_server for affinity across related requests.
Wires AsyncLLMServerManager (and related agent-loop paths) to pass worker / group identity into the load balancer so group affinity works end-to-end.
Adds rollout YAML / dataclass fields: load_balance_strategy, load_balance_weights, load_balance_random_seed, group_sticky_routing, num_load_balance_groups, plus KV cache metrics config for strategies that use server-side usage (e.g. least_kv_cache).
Extends unit tests in tests/experimental/agent_loop/test_basic_agent_loop.py for new strategies and group/request stickiness behavior.

Related: [#5442](#5442), Phase 1 [#5399](#5399).

Checklist Before Starting

https://github.com/verl-project/verl/issues?q=is%3Apr+load+balance+agent+loop (#5399)

API and Usage Example

Rollout / Hydra overrides (illustrative):

# verl/trainer/config/rollout/rollout.yaml (defaults) or CLI overrides
actor_rollout_ref.rollout.load_balance_strategy: least_requests  # or least_kv_cache | weighted_rr | random
actor_rollout_ref.rollout.group_sticky_routing: false
actor_rollout_ref.rollout.num_load_balance_groups: 1
# actor_rollout_ref.rollout.load_balance_weights: [1.0, 2.0, 1.0]   # for weighted_rr
# actor_rollout_ref.rollout.load_balance_random_seed: 123            # for random

Programmatic registration (custom strategy plugin):

from verl.experimental.agent_loop.load_balance import (
    LoadBalanceStrategy,
    register_load_balance_strategy,
)

class MyStrategy(LoadBalanceStrategy):
    def pick_server(self, server_ids, inflight, kv_usage):
        ...

register_load_balance_strategy("my_strategy", MyStrategy)

Design & Code Changes

Area	Change
`verl/experimental/agent_loop/load_balance.py`	Strategy ABC, built-in strategies, registry + `create_load_balance_strategy`.
`verl/experimental/agent_loop/agent_loop.py`	`GlobalRequestLoadBalancer` strategy + group sticky; `build_global_load_balancer_remote_kwargs` from `RolloutConfig`; `AsyncLLMServerManager` passes `group_id`.
`verl/trainer/config/rollout/rollout.yaml` + worker config	New rollout fields for LB + KV metrics; generated trainer YAMLs updated if applicable.
`tests/experimental/agent_loop/test_basic_agent_loop.py`	Tests for strategies, group/request stickiness, prometheus parse helper, unknown strategy error.

High level: Phase 1 centralized routing in one Ray actor; Phase 2 makes the policy behind acquire_server configurable and adds group affinity without duplicating LB logic per worker.

Load balance strategy comparison

Strategy	What it optimizes	internal state	Extra config
`least_requests`	Minimize in-flight request count per replica	inflight_list	None
`least_kv_cache`	Minimize KV cache usage; if fail to get the metrics fall back to `inflight` ordering.	kv_cache_usage, getting from`http://{replica_addr}{metrics_path}` ,inflight list (if we failed to get the metric)	`metric_name`, `metrics_path` ，`refresh_interval_s`
`weighted_rr`	Smooth Weighted Round Robin. Currently, we can only configure the weight at the node level, and cannot distinguish multiple servers running within the same node.	current_weights	`load_balance_weights` (length must match `#servers_address`)
`random`	Uniform random among `server_ids`	none, only random choice	`load_balance_random_seed`

Sticky Strategy Comparison

Different Sticky class	Description
RequestStickyLoadBalance	default choice, using LRU cache on request_id
GroupStickyLoadBalance	Cache based on group_id (same prompt, multiple rollouts)

…affinity This commit introduces several enhancements to the and for improved load balancing. Key changes include: - Added pluggable load balancing strategies: , , , and Please ask your administrator.. - Implemented group-level affinity for routing requests based on , allowing for better session management. - Updated configuration options in to support new load balancing features, including and . Additionally, tests have been updated to reflect these changes, ensuring robust functionality across the agent loop. This update aims to optimize server resource utilization and enhance the overall performance of the agent loop system.

CLAassistant · 2026-04-19T11:52:06Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request introduces pluggable load-balancing strategies for the GlobalRequestLoadBalancer in the agent loop, including support for least-requests, least-KV-cache, weighted round-robin, and random routing. It also adds group-level sticky routing and Prometheus-based metric scraping for KV cache usage. While the architectural changes improve flexibility, the current implementation of metric scraping is synchronous and blocking within the Ray actor, which could create a significant performance bottleneck. Additionally, the use of threading locks in a single-threaded Ray actor is currently redundant, and the error logging for failed metric fetches lacks throttling, potentially leading to log flooding.

gemini-code-assist · 2026-04-19T11:57:05Z

+    def _refresh_kv_metrics_if_needed(self) -> None:
+        if self._strategy_name != "least_kv_cache":
+            return
+        if self._metric_name is None:
+            return
+        now = time.monotonic()
+        if now - self._last_metrics_ts < self._kv_refresh_interval:
+            return
+        self._last_metrics_ts = now
+        for sid in self._server_actor_ids:
+            url = build_metrics_url(sid, self._metrics_path)
+            try:
+                text = fetch_prometheus_text(url, self._fetch_timeout)
+                val = parse_prometheus_metric_value(text, self._metric_name)
+                self._kv_usage[sid] = val
+            except Exception as e:
+                logger.warning("Failed to refresh KV metrics for %s from %s: %s", sid, url, e)
+                self._kv_usage[sid] = None


The _refresh_kv_metrics_if_needed method performs synchronous, blocking HTTP requests to every rollout replica within the acquire_server call. In a Ray actor, this blocks the entire actor from processing any other requests (including release_server or other acquire_server calls) for the duration of all HTTP calls combined. If any replica is slow or unresponsive, the global load balancer becomes a critical bottleneck for the entire rollout process.

Consider moving the metrics collection to a background thread that periodically updates self._kv_usage asynchronously, or refactor the actor to be async and use asyncio.gather with an asynchronous HTTP client.

gemini-code-assist · 2026-04-19T11:57:05Z

-
-    def acquire_server(self, request_id: str) -> str:
+        self._group_id_to_server: LRUCache = LRUCache(maxsize=max_cache_size)
+        self._lock = threading.Lock()


Ray actors are single-threaded by default, processing tasks sequentially. The use of threading.Lock is redundant in this context and adds unnecessary overhead unless the actor explicitly spawns background threads that access shared state. If the metrics collection is moved to a background thread (as suggested in the critical finding), this lock would then become necessary; otherwise, it should be removed.

gemini-code-assist · 2026-04-19T11:57:05Z

+                val = parse_prometheus_metric_value(text, self._metric_name)
+                self._kv_usage[sid] = val
+            except Exception as e:
+                logger.warning("Failed to refresh KV metrics for %s from %s: %s", sid, url, e)


Logging a warning on every failed metric refresh attempt can lead to log flooding, especially since this is called frequently (every 2 seconds by default) and for every replica. If a replica is permanently down or misconfigured, the logs will be overwhelmed. Consider using a throttled logger or only logging the failure once per replica until it recovers.

…lobalRequestLoadBalancer

EricWuji · 2026-04-20T03:17:07Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a pluggable load balancing framework for the GlobalRequestLoadBalancer, supporting strategies such as least_requests, least_kv_cache, weighted_rr, and random. It also implements group-level sticky routing and infrastructure for scraping Prometheus metrics to inform routing decisions. Feedback highlights a potential AttributeError caused by the removal of extra_configs in the rollout configuration and suggests an improvement to the metrics scraper's logging logic to prevent log flooding during connectivity issues.

gemini-code-assist · 2026-04-20T03:19:05Z

 class DiffusionSamplingConfig(SamplingConfig):
+    noise_level: float = 0.0
    num_inference_steps: int = 40
    seed: int = 42


The removal of extra_configs from DiffusionSamplingConfig and DiffusionRolloutConfig (at line 385) will cause an AttributeError in verl/experimental/agent_loop/diffusion_agent_loop.py. That file still references this field in several places (e.g., lines 122, 155, and 165). If the intention was to replace extra_configs with specific fields like noise_level, then diffusion_agent_loop.py must be updated to use those fields. Otherwise, please restore the extra_configs field to maintain backward compatibility for diffusion models.

gemini-code-assist · 2026-04-20T03:19:05Z

+            except Exception as e:
+                self._mark_metrics_failed(sid)
+                logger.warning("Failed to refresh KV metrics for %s from %s: %s", sid, url, e)


The background metrics scraper logs a warning on every failure. If a server is unreachable or the metrics endpoint is misconfigured, this will flood the logs every refresh_interval_s. It is recommended to use the return value of _mark_metrics_failed(sid) to ensure the warning is only logged once per failure cycle.

Suggested change

except Exception as e:

self._mark_metrics_failed(sid)

logger.warning("Failed to refresh KV metrics for %s from %s: %s", sid, url, e)

except Exception as e:

if self._mark_metrics_failed(sid):

logger.warning("Failed to refresh KV metrics for %s from %s: %s", sid, url, e)

wuxibin89 · 2026-04-20T03:34:16Z

@EricWuji Could you do some comparison about different load balance strategy?

EricWuji · 2026-04-20T13:03:03Z

@wuxibin89 Thanks for your reply. I've updated the PR description with comparison sections. You can find them at the end of the description.

…hase_2

EricWuji · 2026-04-20T13:08:32Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a pluggable load-balancing framework for the GlobalRequestLoadBalancer, supporting strategies such as least requests, least KV cache usage, weighted round-robin, and random selection. It also implements group-level sticky routing to enhance prefix caching efficiency across worker groups. The review feedback highlights several critical areas for improvement: the Prometheus metric parser needs to be more robust to handle optional timestamps, the background metrics collection should be parallelized and wrapped in exception handling to prevent silent failures, and the weighted round-robin implementation should pre-compute mappings to avoid unnecessary overhead during request routing.

gemini-code-assist · 2026-04-20T13:15:55Z

+def _last_float_on_line(line: str) -> Optional[float]:
+    matches = _FLOAT_RE.findall(line)
+    if not matches:
+        return None
+    try:
+        return float(matches[-1])
+    except ValueError:
+        return None


The _last_float_on_line function is unreliable because the Prometheus text exposition format allows an optional timestamp after the value. Using findall and taking the last match will incorrectly return the timestamp (a large integer) as the metric value. Additionally, labels in Prometheus can contain spaces, which can break simple splitting. The value should be extracted as the first token following the metric name or the closing brace of the labels.

def _last_float_on_line(line: str) -> Optional[float]: # Prometheus format: metric_name{labels} value [timestamp] if '}' in line: rest = line[line.rfind('}')+1:] else: parts = line.split(None, 1) rest = parts[1] if len(parts) > 1 else "" tokens = rest.split() if tokens: try: return float(tokens[0]) except ValueError: pass return None

gemini-code-assist · 2026-04-20T13:15:55Z

+        for sid in self._server_actor_ids:
+            url = build_metrics_url(sid, self._metrics_path)
+            try:
+                text = fetch_prometheus_text(url, self._fetch_timeout_s)
+                val = parse_prometheus_metric_value(text, metric_name)
+                if self._mark_metrics_success(sid, val):
+                    logger.info("KV cache metrics recovered for replica %s", sid)
+            except Exception as e:
+                if self._mark_metrics_failed(sid):
+                    logger.warning("Failed to refresh KV metrics for %s from %s: %s", sid, url, e)


Metrics for each replica are fetched sequentially in a blocking loop. In large-scale deployments, slow or unresponsive replicas can cause the background refresh thread to stall, leading to stale metrics for all servers. Consider using a thread pool or asynchronous requests to fetch metrics from all replicas in parallel.

gemini-code-assist · 2026-04-20T13:15:55Z

+        inflight: dict[str, int],
+    ) -> str:
+        # Selection ignores inflight; weights define share across replicas.
+        pos = {sid: i for i, sid in enumerate(self._ids)}


The pos dictionary is re-created on every call to pick_server. This is inefficient for a load balancer handling high request volumes. This mapping should be pre-computed once in __init__ and stored as an instance attribute.

gemini-code-assist · 2026-04-20T13:15:55Z

+        while not self._metrics_stop.is_set():
+            self._refresh_metrics_blocking()
+            if self._metrics_stop.wait(timeout=self._refresh_interval_s):
+                break
+


The background thread loop lacks exception handling. An unhandled exception in _refresh_metrics_blocking will cause the daemon thread to terminate silently, stopping all future metric updates for the lifetime of the process. Wrap the loop body in a try-except block to ensure reliability.

EricWuji · 2026-04-20T16:22:31Z

/gemini review

gemini-code-assist

Code Review

This pull request implements a pluggable load balancing framework for the agent loop, introducing strategies such as least requests, least KV cache usage, weighted round-robin, and random selection. It also adds support for group-based sticky sessions to facilitate GRPO rollouts and includes Prometheus metric scraping for dynamic load assessment. The review feedback identifies a logic error in the weighted round-robin implementation, a discrepancy between the group sticky implementation and its documentation, an efficiency concern regarding thread pool instantiation, and a mismatch in the fallback behavior for missing metrics.

gemini-code-assist · 2026-04-20T16:25:03Z

+            self._inflight[self._ids[i]] += self._weights[i]
+        best_i = max(indices, key=lambda i: (self._inflight[self._ids[i]], -i))
+        self._inflight[self._ids[best_i]] -= total
+        return self._ids[best_i]


The WeightedRoundRobinStrategy incorrectly uses self._inflight as the accumulator for the smooth weighted round-robin algorithm. However, self._inflight is also used by the GlobalRequestLoadBalancer to track the actual number of active requests (via update_inflight and release_server).

This leads to two critical issues:

The WRR state is corrupted by real-time request counts.

Since WRR uses max() to pick the server with the highest accumulated weight, and GlobalRequestLoadBalancer increments _inflight for active requests, this strategy will actually prefer servers that are more heavily loaded, which is the opposite of load balancing.

You should use the self._current attribute (which is initialized but unused) for the WRR algorithm state.

Suggested change

self._inflight[self._ids[i]] += self._weights[i]

best_i = max(indices, key=lambda i: (self._inflight[self._ids[i]], -i))

self._inflight[self._ids[best_i]] -= total

return self._ids[best_i]

for i in indices:

self._current[i] += self._weights[i]

best_i = max(indices, key=lambda i: (self._current[i], -i))

self._current[best_i] -= total

gemini-code-assist · 2026-04-20T16:25:03Z

+    def acquire_server(self, request_id: str, request_group_id: str | None = None) -> str:
+        if request_group_id and request_group_id in self._request_group_to_server:
+            server_id = self._request_group_to_server[request_group_id]
+            self.update_inflight(server_id, 1)
+            return server_id
+        server_id = self.strategy.pick_server(list(self._server_actor_ids))
+        if request_group_id:
+            self._request_group_to_server[request_group_id] = server_id
+        self.update_inflight(server_id, 1)
+        return server_id


The implementation of GroupStickyLoadBalancer does not match the documentation in docs/advance/agent_loop.rst. The documentation states that routing should consult the request_id LRU first, then the request_group_id LRU.

Currently, GroupStickyLoadBalancer only tracks request_group_id. This means that if request_group_id is not provided (or if a request needs stickiness outside of a group context), it will fall back to a fresh pick from the strategy, losing request_id stickiness. To match the intended design, this class should maintain both LRU caches.

gemini-code-assist · 2026-04-20T16:25:03Z

+        with ThreadPoolExecutor(max_workers=max_workers, thread_name_prefix="verl-lb-kv") as pool:
+            list(pool.map(worker, ids))


Creating a new ThreadPoolExecutor every 2 seconds (on every metrics refresh) is inefficient. The thread pool should be initialized once in the constructor and reused for all refresh cycles.

gemini-code-assist · 2026-04-20T16:25:03Z

+        def key(sid: str) -> tuple[float, int, str]:
+            kv = usage.get(sid)
+            if kv is None:
+                return (float("inf"), self._inflight[sid], sid)
+            return (kv, self._inflight[sid], sid)


The fallback logic for failed metrics in LeastKVCacheStrategy does not match the documentation. The documentation states that it should fall back to 'least in-flight behavior', but the implementation uses float('inf') as the primary key for servers with missing metrics.

This causes servers with failed scrapes to be avoided even if they have zero in-flight requests and other servers are heavily loaded. To implement a true fallback, you should use a neutral value for missing metrics or adjust the sorting key so that in-flight counts are compared fairly across all servers when metrics are missing.

EricWuji · 2026-04-21T05:20:12Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a pluggable and configurable global load balancing system for the agent-loop rollout. It adds support for multiple strategies—including least-requests, least-KV-cache (via Prometheus metrics scraping), weighted round-robin, and random—alongside sticky session modes for both individual requests and GRPO rollout groups. The changes include refactoring the load balancer into dedicated modules, updating configuration schemas, and adding comprehensive tests. Feedback was provided to ensure deterministic tie-breaking in the least-KV-cache strategy's fallback logic.

gemini-code-assist · 2026-04-21T05:25:26Z

+
+        has_none = any(usage.get(sid) is None for sid in server_ids)
+        if has_none:  # fallback to least in-flight
+            return min(server_ids, key=lambda sid: self._inflight[sid])


For consistency with the LeastRequestsStrategy and the other branch of pick_server in this class, the fallback logic should include the server ID as a tie-breaker. This ensures deterministic behavior when multiple servers have the same in-flight request count.

Suggested change

return min(server_ids, key=lambda sid: self._inflight[sid])

return min(server_ids, key=lambda sid: (self._inflight[sid], sid))

EricWuji · 2026-04-21T05:37:08Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a pluggable global load balancing system for the agent loop, supporting strategies such as least requests, least KV cache usage, weighted round-robin, and random selection. It also implements a "group sticky" mode to route related rollout requests to the same server replica. The changes include new strategy implementations, Prometheus metric parsing utilities, and updated configuration schemas. Feedback suggests refining the weight fallback logic in the round-robin strategy and enhancing HTTP error handling during metric scraping.

gemini-code-assist · 2026-04-21T05:42:45Z

+            w = [1.0] * n
+        else:
+            # if weight is not found, use 1.0 as default
+            w = [resolve_load_balance_weight(weights, sid) or 1.0 for sid in self._ids]


The or 1.0 fallback here will incorrectly override a weight of 0.0 to 1.0. While RolloutConfig validation currently prevents non-positive weights, this strategy class should ideally respect a zero weight if it were to be used to drain a server, or at least handle it explicitly. If 0.0 is intended to be invalid, it's better to rely on the config validation or raise an error here rather than silently changing it to 1.0 which might be confusing if the user explicitly set a very low weight.

Suggested change

w = [resolve_load_balance_weight(weights, sid) or 1.0 for sid in self._ids]

w = [resolve_load_balance_weight(weights, sid) if resolve_load_balance_weight(weights, sid) is not None else 1.0 for sid in self._ids]

gemini-code-assist · 2026-04-21T05:42:45Z

+def fetch_prometheus_text(url: str, timeout_s: float = 2.0) -> str:
+    """HTTP GET returning Prometheus text exposition body."""
+    req = urllib.request.Request(url, method="GET")
+    with urllib.request.urlopen(req, timeout=timeout_s) as resp:
+        return resp.read().decode("utf-8", errors="replace")


The fetch_prometheus_text function uses urllib.request.urlopen without handling potential HTTP errors (e.g., 404, 500). While the caller LeastKVCacheStrategy._refresh_one_replica catches all exceptions, it would be more robust to handle urllib.error.HTTPError specifically to provide better error messages or handle specific status codes.

EricWuji · 2026-04-21T07:55:13Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a pluggable load balancing framework for the agent loop, supporting multiple strategies such as least requests, least KV cache usage (via Prometheus scraping), weighted round-robin, and random selection. It also adds a "group sticky" session mode to ensure that repeated rollouts for the same sample are routed to the same server instance, which is particularly useful for GRPO. The changes include new configuration parameters, comprehensive unit tests, and updated documentation. Review feedback suggests improving the encapsulation between the load balancer and its strategies by avoiding direct access to internal attributes like _inflight, and notes that the WeightedRoundRobinStrategy contains redundant state that is not utilized in its routing logic.

gemini-code-assist · 2026-04-21T08:00:21Z

+
+    def release_server(self, server_id: str) -> None:
+        """Release a server after a request completes, decrementing strategy in-flight counts."""
+        inflight = getattr(self.strategy, "_inflight", None)


Accessing _inflight via getattr on the strategy object breaks encapsulation and makes the plugin system fragile. If a user registers a custom strategy that doesn't use this specific attribute name for tracking load, release_server will silently fail to update the state.

Consider defining an explicit interface in the LoadBalanceStrategy base class (e.g., notify_acquire and notify_release methods) to handle in-flight count updates, or move the in-flight tracking logic entirely into the GlobalRequestLoadBalancer base class if it's intended to be a core feature for all strategies.

gemini-code-assist · 2026-04-21T08:00:21Z

+            ]
+        self._weights = w
+        self._current = [0.0] * n
+        self._inflight: dict[str, int] = {sid: 0 for sid in self._ids}


The WeightedRoundRobinStrategy initializes self._inflight but never uses it in pick_server. While GlobalRequestLoadBalancer will still update these counts via release_server, the WRR strategy remains purely based on weights and doesn't account for actual server load. If the intention was to implement a 'Weighted Least Requests' strategy, pick_server should be updated to factor in self._inflight. If pure WRR is intended, the _inflight attribute is redundant and potentially misleading.

EricWuji requested review from ArronHZG, PeterSH6, eric-haibin-lin, tongyx361, vermouth1992 and wuxibin89 as code owners April 19, 2026 11:52

gemini-code-assist bot reviewed Apr 19, 2026

View reviewed changes

EricWuji added 2 commits April 20, 2026 11:01

fix: filter interferce engine metrics and remove the heavy logic in G…

edf8fa6

…lobalRequestLoadBalancer

fix: remove the redundant lock

e3c446b

gemini-code-assist bot reviewed Apr 20, 2026

View reviewed changes

EricWuji added 2 commits April 20, 2026 11:23

fix: remove the redundant init in agent_loop.FullyAsyncLLMServerManager

96c3f0d

fix: recover diffuion_sample_config and aviod log flood

ed56552

EricWuji and others added 3 commits April 20, 2026 11:38

fix test code

87161c8

fix redutant code in test_basic_agent_loop.py

6ecbf54

fix test_basic_agent_loop

aa19988

Merge remote-tracking branch 'upstream/HEAD' into feat/load_balance_p…

94ede02

…hase_2

gemini-code-assist bot reviewed Apr 20, 2026

View reviewed changes

EricWuji added 2 commits April 21, 2026 00:09

fix: add two sticky load balance

7b4a899

fix: use thread pool to get metrics

a6632d7

gemini-code-assist bot reviewed Apr 20, 2026

View reviewed changes

fix: weighted_rr config using host_addres -> weight

9526b2d

gemini-code-assist bot reviewed Apr 21, 2026

View reviewed changes

EricWuji added 2 commits April 21, 2026 13:30

fix: teacher rollout config

05b2506

fix: least kv cache add a tie-breaker

7159635

gemini-code-assist bot reviewed Apr 21, 2026

View reviewed changes

fix: fix weight_rr init, add exception when get metrics

02e1fd8

gemini-code-assist bot reviewed Apr 21, 2026

View reviewed changes

		with ThreadPoolExecutor(max_workers=max_workers, thread_name_prefix="verl-lb-kv") as pool:
		list(pool.map(worker, ids))

	return min(server_ids, key=lambda sid: self._inflight[sid])
	return min(server_ids, key=lambda sid: (self._inflight[sid], sid))

	w = [resolve_load_balance_weight(weights, sid) or 1.0 for sid in self._ids]
	w = [resolve_load_balance_weight(weights, sid) if resolve_load_balance_weight(weights, sid) is not None else 1.0 for sid in self._ids]

Conversation

EricWuji commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What does this PR do?

Checklist Before Starting

API and Usage Example

Design & Code Changes

Load balance strategy comparison

Sticky Strategy Comparison

Uh oh!

CLAassistant commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

EricWuji commented Apr 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

wuxibin89 commented Apr 20, 2026

Uh oh!

EricWuji commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EricWuji commented Apr 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

EricWuji commented Apr 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

EricWuji commented Apr 19, 2026 •

edited

Loading

CLAassistant commented Apr 19, 2026 •

edited

Loading

EricWuji commented Apr 20, 2026 •

edited

Loading