Skip to content

Conversation

@jrhee17
Copy link
Contributor

@jrhee17 jrhee17 commented Nov 11, 2025

Motivation:

When using xDS based clients, users will often want to know how the xDS resources translate to the load balancer which routes requests upstream.
This changeset attempts introduce a DefaultXdsLoadBalancerLifecycleObserver which collects metrics of each load balancer for each cluster.
This observer has the same lifecycle as a ClusterResourceNode, and will deregister each metric upon close.

Note that thanks to XdsClusterManager, each cluster has a unique name.

Modifications:

  • A DefaultXdsLoadBalancerLifecycleObserver is introduced which records metrics for a XdsLoadBalancer
  • MeterRegistry, MeterIdPrefix can now be specified in XdsBootstrapBuilder
    • MeterRegistry, MeterIdPrefix is also added to SubscriptionContext so that they can be accessed when dynamically querying xDS resources
    • An xDS-dedicated EventExecutor is used for the default event loop.

Result:

  • DefaultXdsLoadBalancerLifecycleObserver now records metrics for XdsLoadBalancers by default

@jrhee17 jrhee17 added this to the 1.34.0 milestone Nov 11, 2025
@jrhee17 jrhee17 marked this pull request as ready for review November 12, 2025 00:33
Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍👍

Comment on lines 106 to 107
final ImmutableList.Builder<Meter> metersBuilder = ImmutableList.builder();
metersBuilder.add(Gauge.builder(prefix.name("lb.state.subsets"), () -> numSubsets)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Collection seems unnecessary.

localClusterName = bootstrap.getClusterManager().getLocalClusterName();
if (!Strings.isNullOrEmpty(localClusterName) && bootstrap.getNode().hasLocality()) {
final DefaultXdsLoadBalancerLifecycleObserver observer =
new DefaultXdsLoadBalancerLifecycleObserver(meterIdPrefix, meterRegistry, localClusterName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that there’s still no extension point for users to configure a custom XdsLoadBalancerLifecycleObserver.

Copy link
Contributor

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍 👍

private final MeterIdPrefix prefix;
private final List<Meter> meters;

private int numSubsets;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be volatile or AtomicInteger?

Comment on lines 156 to 159
private int healthyLoad;
private int degradedLoad;
private boolean panicState;
private double zarLocalPercentage;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 227 to 231
private int total;
private int healthy;
private int degraded;
private int localityWeight;
private double zarPercentage;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@jrhee17 jrhee17 modified the milestones: 1.34.0, 1.35.0 Nov 24, 2025
@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Walkthrough

Threads MeterRegistry and MeterIdPrefix through the XDS bootstrap/cluster/subscription stack and adds a package-private DefaultXdsLoadBalancerLifecycleObserver that records Micrometer metrics for XDS load balancer lifecycle events. Adds integration tests validating metrics for static, dynamic (revision) and complex multi-locality scenarios.

Changes

Cohort / File(s) Summary
Integration test
it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsLoadBalancerLifecycleObserverTest.java
New integration test exercising static, dynamic (revision updates) and complex zone-aware multi-locality scenarios; captures and asserts Prometheus/Micrometer metrics and verifies cleanup.
New metrics observer
xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java
New package-private implementation of XdsLoadBalancerLifecycleObserver that creates hierarchical Micrometer recorders (SnapshotUpdateRecorder, LoadBalancerRecorder, PriorityRecorder, LocalityRecorder) to track revisions, update counts, per-priority/per-locality health, weights, subsets, panic/zar metrics, and supports lifecycle cleanup.
SubscriptionContext API & impls
xds/src/main/java/com/linecorp/armeria/xds/SubscriptionContext.java, xds/src/main/java/com/linecorp/armeria/xds/DefaultSubscriptionContext.java, xds/src/main/java/com/linecorp/armeria/xds/StaticSubscriptionContext.java
Added meterRegistry() and meterIdPrefix() to the interface; constructors and fields added to implementations to store and expose MeterRegistry and MeterIdPrefix.
Bootstrap / cluster wiring
xds/src/main/java/com/linecorp/armeria/xds/BootstrapClusters.java, xds/src/main/java/com/linecorp/armeria/xds/XdsClusterManager.java, xds/src/main/java/com/linecorp/armeria/xds/XdsBootstrapImpl.java
Thread MeterIdPrefix and MeterRegistry through constructors and initialization paths; create DefaultXdsLoadBalancerLifecycleObserver when constructing load balancers and when registering clusters (static, on-demand CDS), and propagate meter context throughout bootstrap initialization.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant XdsClusterManager
    participant XdsLoadBalancerObserver as Observer
    participant MeterRegistry

    Client->>XdsClusterManager: register/initialize cluster (bootstrap)
    XdsClusterManager->>Observer: instantiate observer(clusterName, meterIdPrefix, meterRegistry)
    Observer->>MeterRegistry: register meters/gauges/counters (per-cluster, per-priority, per-locality)
    Client->>XdsClusterManager: deliver snapshot / hostset / state updates
    XdsClusterManager->>Observer: resourceUpdated/endpointsUpdated/stateUpdated/stateRejected
    Observer->>MeterRegistry: update meters (revision, counts, health, weights, panic/zar)
    Note right of MeterRegistry: Meters are updated and later removed on close
    Client->>XdsClusterManager: close/unregister cluster
    XdsClusterManager->>Observer: close()
    Observer->>MeterRegistry: remove/close meters
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Inspect DefaultXdsLoadBalancerLifecycleObserver: nested recorder logic, concurrency, and correct meter tagging/cleanup.
  • Verify SubscriptionContext and StaticSubscriptionContext constructor changes are propagated to all call sites.
  • Confirm BootstrapClusters/XdsClusterManager/XdsBootstrapImpl wiring passes meter context in all registration paths (static, dynamic/on-demand, local vs non-local).
  • Review integration test for flakiness, resource cleanup, and metric assertions.

Possibly related PRs

Suggested reviewers

  • trustin
  • minwoox
  • ikhoon

Poem

🐰 Hop, hop — meters waking slow,

Revisions counted, zones aglow,
Observers watch each rise and fall,
Subsets, priorities — I note them all,
A carrot for metrics, big and small 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change—introducing the DefaultXdsLoadBalancerLifecycleObserver class. It aligns with the primary modifications throughout the changeset.
Description check ✅ Passed The description is clearly related to the changeset, detailing the motivation, modifications, and expected results of introducing metrics collection for XDS load balancers.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java (3)

100-100: Thread-safety concern: numSubsets should be volatile or AtomicInteger.

This field is written in stateUpdated() and read by the Gauge lambda (potentially from a different thread during metrics collection). Without volatile, the updated value may not be visible to the scraping thread.


156-159: Thread-safety concern: fields read by Gauge lambdas should be volatile.

The fields healthyLoad, degradedLoad, panicState, and zarLocalPercentage are written in update() and read by Gauge lambdas. These should be volatile to ensure visibility across threads.


227-231: Thread-safety concern: fields read by Gauge lambdas should be volatile.

The fields total, healthy, degraded, localityWeight, and zarPercentage are written in update() and read by Gauge lambdas. These should be volatile to ensure visibility across threads.

🧹 Nitpick comments (3)
it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsLoadBalancerLifecycleObserverTest.java (2)

53-55: Consider test isolation for parallel execution.

The cache and version are static and shared across all tests. If tests run in parallel, they could interfere with each other since they all use the same GROUP key for cache snapshots.

Consider either:

  1. Using @TestInstance(Lifecycle.PER_CLASS) with instance fields
  2. Using unique group keys per test method
  3. Resetting state in @BeforeEach

269-318: Consider using try-with-resources for listenerRoot.

The listenerRoot at line 279 is created but not wrapped in a try-with-resources block. While it's explicitly closed at line 317, if an exception occurs between creation and close, it may leak. The complexCase test handles this correctly with nested try-with-resources at line 539.

         try (XdsBootstrap xdsBootstrap = XdsBootstrap.builder(bootstrap)
                                                      .meterRegistry(meterRegistry)
-                                                     .build()) {
+                                                     .build();
+              ListenerRoot listenerRoot = xdsBootstrap.listenerRoot("my-listener")) {

             // Step 1: Set initial snapshot with first endpoint
             version.incrementAndGet();
             cache.setSnapshot(GROUP, Snapshot.create(ImmutableList.of(cluster), ImmutableList.of(endpoint1),
                                                      ImmutableList.of(listener), ImmutableList.of(),
                                                      ImmutableList.of(), version.toString()));

-            final ListenerRoot listenerRoot = xdsBootstrap.listenerRoot("my-listener");
-
             await().untilAsserted(() -> {
xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java (1)

284-284: Thread-safety concern: revision should be volatile.

Consistent with the concerns on other recorder fields, revision is written in snapshotUpdated() and read by the Gauge lambda. It should be volatile to ensure visibility across threads.

-        private long revision;
+        private volatile long revision;
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b2d426 and cf7be27.

📒 Files selected for processing (8)
  • it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsLoadBalancerLifecycleObserverTest.java (1 hunks)
  • xds/src/main/java/com/linecorp/armeria/xds/BootstrapClusters.java (2 hunks)
  • xds/src/main/java/com/linecorp/armeria/xds/DefaultSubscriptionContext.java (2 hunks)
  • xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java (1 hunks)
  • xds/src/main/java/com/linecorp/armeria/xds/StaticSubscriptionContext.java (2 hunks)
  • xds/src/main/java/com/linecorp/armeria/xds/SubscriptionContext.java (1 hunks)
  • xds/src/main/java/com/linecorp/armeria/xds/XdsBootstrapImpl.java (2 hunks)
  • xds/src/main/java/com/linecorp/armeria/xds/XdsClusterManager.java (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java

⚙️ CodeRabbit configuration file

**/*.java: - The primary coding conventions and style guide for this project are defined in site/src/pages/community/developer-guide.mdx. Please strictly adhere to this file as the ultimate source of truth for all style and convention-related feedback.

2. Specific check for @UnstableApi

  • Review all newly added public classes and methods to ensure they have the @UnstableApi annotation.
  • However, this annotation is NOT required under the following conditions:
    • If the class or method is located in a package containing .internal or .testing.
    • If the class or method is located in a test source set.
    • If a public method is part of a class that is already annotated with @UnstableApi.

Files:

  • it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsLoadBalancerLifecycleObserverTest.java
  • xds/src/main/java/com/linecorp/armeria/xds/SubscriptionContext.java
  • xds/src/main/java/com/linecorp/armeria/xds/StaticSubscriptionContext.java
  • xds/src/main/java/com/linecorp/armeria/xds/DefaultSubscriptionContext.java
  • xds/src/main/java/com/linecorp/armeria/xds/BootstrapClusters.java
  • xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java
  • xds/src/main/java/com/linecorp/armeria/xds/XdsClusterManager.java
  • xds/src/main/java/com/linecorp/armeria/xds/XdsBootstrapImpl.java
🧬 Code graph analysis (1)
it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsLoadBalancerLifecycleObserverTest.java (3)
core/src/main/java/com/linecorp/armeria/common/metric/MoreMeters.java (1)
  • MoreMeters (42-203)
junit5/src/main/java/com/linecorp/armeria/testing/junit5/server/ServerExtension.java (1)
  • ServerExtension (49-405)
it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsResourceReader.java (1)
  • XdsResourceReader (31-65)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: build-ubicloud-standard-16-jdk-17-min-java-17-coverage
  • GitHub Check: build-ubicloud-standard-16-jdk-17-min-java-11
  • GitHub Check: build-ubicloud-standard-16-jdk-17-leak
  • GitHub Check: build-ubicloud-standard-16-jdk-21-snapshot-blockhound
  • GitHub Check: build-ubicloud-standard-16-jdk-11
  • GitHub Check: build-ubicloud-standard-16-jdk-8
  • GitHub Check: build-windows-latest-jdk-25
  • GitHub Check: site
  • GitHub Check: build-ubicloud-standard-16-jdk-25
  • GitHub Check: build-macos-latest-jdk-25
  • GitHub Check: flaky-tests
  • GitHub Check: lint
  • GitHub Check: Kubernetes Chaos test
  • GitHub Check: Summary
🔇 Additional comments (11)
xds/src/main/java/com/linecorp/armeria/xds/SubscriptionContext.java (1)

28-31: LGTM!

The new accessor methods for meterRegistry() and meterIdPrefix() are clean additions to the package-private interface, enabling metrics context propagation throughout the subscription flow.

xds/src/main/java/com/linecorp/armeria/xds/XdsBootstrapImpl.java (1)

45-69: LGTM!

The meter context (MeterIdPrefix and MeterRegistry) is properly threaded through the initialization chain to XdsClusterManager, BootstrapClusters, and DefaultSubscriptionContext. The test constructor appropriately defaults to Flags.meterRegistry().

it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsLoadBalancerLifecycleObserverTest.java (1)

624-633: LGTM!

The measureAll helper methods provide clean utility for filtering metrics in assertions, and the default filter appropriately scopes to armeria.xds.lb prefix.

xds/src/main/java/com/linecorp/armeria/xds/DefaultSubscriptionContext.java (1)

30-58: LGTM!

The DefaultSubscriptionContext correctly implements the new meterRegistry() and meterIdPrefix() accessors from the SubscriptionContext interface. Fields are properly declared as private final and initialized in the constructor.

xds/src/main/java/com/linecorp/armeria/xds/XdsClusterManager.java (3)

54-67: LGTM!

The constructor properly initializes the local load balancer with a metrics-enabled observer when conditions are met (non-empty local cluster name and locality present). The observer is correctly passed to XdsLoadBalancer.of().


69-111: LGTM!

Both register method overloads consistently create DefaultXdsLoadBalancerLifecycleObserver instances with appropriate meter context from the SubscriptionContext. This ensures metrics are properly collected for both statically registered clusters and on-demand CDS registrations.


42-42: Remove unused static field.

The static observer field at line 42 appears to be dead code after the refactoring to use DefaultXdsLoadBalancerLifecycleObserver instances.

-    private static final XdsLoadBalancerLifecycleObserver observer = new XdsLoadBalancerLifecycleObserver() {};
-
xds/src/main/java/com/linecorp/armeria/xds/BootstrapClusters.java (1)

43-61: LGTM!

The meter context (MeterIdPrefix, MeterRegistry) is correctly threaded through the constructor and into initializePrimary, where it's passed to StaticSubscriptionContext. This aligns well with the broader meter propagation pattern in this PR.

xds/src/main/java/com/linecorp/armeria/xds/StaticSubscriptionContext.java (1)

26-57: LGTM!

The new meterIdPrefix and meterRegistry fields are properly stored as final and exposed via the SubscriptionContext interface accessors. This enables meter-aware initialization for static clusters.

xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java (2)

41-91: Clean observer structure.

The observer correctly delegates lifecycle events to dedicated recorders, each with its own metric prefix. The close() method properly cleans up all recorders, preventing meter leaks.


118-138: Good cleanup pattern for dynamic priority/locality changes.

The pattern of comparing previous and current maps to close removed recorders is well-implemented. This ensures meters are properly deregistered when priorities are removed from the load balancer state.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java (1)

184-186: Consider defensive check instead of assert.

The assertion assert this.priority == priority; verifies a critical invariant, but assertions are often disabled in production (-da flag). If this invariant is important for correctness, consider an explicit check:

 void update(int priority, HostSet hostSet, LoadBalancerState state) {
-    assert this.priority == priority;
+    if (this.priority != priority) {
+        throw new IllegalArgumentException("Priority mismatch: expected " + this.priority + ", got " + priority);
+    }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf7be27 and 8d7efce.

📒 Files selected for processing (1)
  • xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java

⚙️ CodeRabbit configuration file

**/*.java: - The primary coding conventions and style guide for this project are defined in site/src/pages/community/developer-guide.mdx. Please strictly adhere to this file as the ultimate source of truth for all style and convention-related feedback.

2. Specific check for @UnstableApi

  • Review all newly added public classes and methods to ensure they have the @UnstableApi annotation.
  • However, this annotation is NOT required under the following conditions:
    • If the class or method is located in a package containing .internal or .testing.
    • If the class or method is located in a test source set.
    • If a public method is part of a class that is already annotated with @UnstableApi.

Files:

  • xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: build-ubicloud-standard-16-jdk-21-snapshot-blockhound
  • GitHub Check: build-ubicloud-standard-16-jdk-25
  • GitHub Check: build-ubicloud-standard-16-jdk-17-min-java-17-coverage
  • GitHub Check: build-ubicloud-standard-16-jdk-17-min-java-11
  • GitHub Check: build-ubicloud-standard-16-jdk-17-leak
  • GitHub Check: build-windows-latest-jdk-25
  • GitHub Check: build-ubicloud-standard-16-jdk-8
  • GitHub Check: build-ubicloud-standard-16-jdk-11
  • GitHub Check: build-macos-latest-jdk-25
  • GitHub Check: site
  • GitHub Check: flaky-tests
  • GitHub Check: lint
  • GitHub Check: Kubernetes Chaos test
  • GitHub Check: Summary
🔇 Additional comments (6)
xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java (6)

41-61: LGTM!

Clean class structure with well-organized recorder initialization. The metric prefix hierarchy is properly set up with cluster tags.


63-91: LGTM!

Lifecycle callbacks are correctly delegated to the appropriate recorders, and close() properly releases all resources.


145-150: Same visibility concern for localityMap field.

Similar to priorityMap in LoadBalancerRecorder, the localityMap field is reassigned in update() (line 208) and read in close() (line 214) without volatile. The same thread-safety consideration applies.


218-276: LGTM!

Well-structured recorder with proper volatile fields for gauge backing values. The meter registration and cleanup are correctly implemented.


278-305: LGTM!

The revision field is correctly marked volatile for thread-safe gauge reads. Counter operations are inherently thread-safe. Clean implementation.


93-143: Verify thread-safety of priorityMap field.

The priorityMap field is reassigned on line 135 and read in close() on line 141. While numSubsets is correctly marked volatile, priorityMap is not. If stateUpdated() and close() can be called from different threads, the non-volatile field could cause visibility issues where close() observes a stale reference.

If all access is guaranteed single-threaded via the xDS-dedicated EventExecutor, this is acceptable. Otherwise, consider making priorityMap volatile to ensure visibility of the reference update.

@jrhee17 jrhee17 merged commit ba803e2 into line:main Dec 9, 2025
15 of 16 checks passed
@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 0.00%. Comparing base (8150425) to head (8d7efce).
⚠️ Report is 277 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #6490       +/-   ##
============================================
- Coverage     74.46%       0   -74.47%     
============================================
  Files          1963       0     -1963     
  Lines         82437       0    -82437     
  Branches      10764       0    -10764     
============================================
- Hits          61385       0    -61385     
+ Misses        15918       0    -15918     
+ Partials       5134       0     -5134     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants