-
Notifications
You must be signed in to change notification settings - Fork 968
Introduce DefaultXdsLoadBalancerLifecycleObserver
#6490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ikhoon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍👍
| final ImmutableList.Builder<Meter> metersBuilder = ImmutableList.builder(); | ||
| metersBuilder.add(Gauge.builder(prefix.name("lb.state.subsets"), () -> numSubsets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Collection seems unnecessary.
| localClusterName = bootstrap.getClusterManager().getLocalClusterName(); | ||
| if (!Strings.isNullOrEmpty(localClusterName) && bootstrap.getNode().hasLocality()) { | ||
| final DefaultXdsLoadBalancerLifecycleObserver observer = | ||
| new DefaultXdsLoadBalancerLifecycleObserver(meterIdPrefix, meterRegistry, localClusterName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that there’s still no extension point for users to configure a custom XdsLoadBalancerLifecycleObserver.
minwoox
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍 👍
| private final MeterIdPrefix prefix; | ||
| private final List<Meter> meters; | ||
|
|
||
| private int numSubsets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be volatile or AtomicInteger?
| private int healthyLoad; | ||
| private int degradedLoad; | ||
| private boolean panicState; | ||
| private double zarLocalPercentage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| private int total; | ||
| private int healthy; | ||
| private int degraded; | ||
| private int localityWeight; | ||
| private double zarPercentage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
WalkthroughThreads MeterRegistry and MeterIdPrefix through the XDS bootstrap/cluster/subscription stack and adds a package-private DefaultXdsLoadBalancerLifecycleObserver that records Micrometer metrics for XDS load balancer lifecycle events. Adds integration tests validating metrics for static, dynamic (revision) and complex multi-locality scenarios. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant XdsClusterManager
participant XdsLoadBalancerObserver as Observer
participant MeterRegistry
Client->>XdsClusterManager: register/initialize cluster (bootstrap)
XdsClusterManager->>Observer: instantiate observer(clusterName, meterIdPrefix, meterRegistry)
Observer->>MeterRegistry: register meters/gauges/counters (per-cluster, per-priority, per-locality)
Client->>XdsClusterManager: deliver snapshot / hostset / state updates
XdsClusterManager->>Observer: resourceUpdated/endpointsUpdated/stateUpdated/stateRejected
Observer->>MeterRegistry: update meters (revision, counts, health, weights, panic/zar)
Note right of MeterRegistry: Meters are updated and later removed on close
Client->>XdsClusterManager: close/unregister cluster
XdsClusterManager->>Observer: close()
Observer->>MeterRegistry: remove/close meters
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java (3)
100-100: Thread-safety concern:numSubsetsshould bevolatileorAtomicInteger.This field is written in
stateUpdated()and read by the Gauge lambda (potentially from a different thread during metrics collection). Withoutvolatile, the updated value may not be visible to the scraping thread.
156-159: Thread-safety concern: fields read by Gauge lambdas should bevolatile.The fields
healthyLoad,degradedLoad,panicState, andzarLocalPercentageare written inupdate()and read by Gauge lambdas. These should bevolatileto ensure visibility across threads.
227-231: Thread-safety concern: fields read by Gauge lambdas should bevolatile.The fields
total,healthy,degraded,localityWeight, andzarPercentageare written inupdate()and read by Gauge lambdas. These should bevolatileto ensure visibility across threads.
🧹 Nitpick comments (3)
it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsLoadBalancerLifecycleObserverTest.java (2)
53-55: Consider test isolation for parallel execution.The
cacheandversionare static and shared across all tests. If tests run in parallel, they could interfere with each other since they all use the sameGROUPkey for cache snapshots.Consider either:
- Using
@TestInstance(Lifecycle.PER_CLASS)with instance fields- Using unique group keys per test method
- Resetting state in
@BeforeEach
269-318: Consider using try-with-resources forlistenerRoot.The
listenerRootat line 279 is created but not wrapped in a try-with-resources block. While it's explicitly closed at line 317, if an exception occurs between creation and close, it may leak. ThecomplexCasetest handles this correctly with nested try-with-resources at line 539.try (XdsBootstrap xdsBootstrap = XdsBootstrap.builder(bootstrap) .meterRegistry(meterRegistry) - .build()) { + .build(); + ListenerRoot listenerRoot = xdsBootstrap.listenerRoot("my-listener")) { // Step 1: Set initial snapshot with first endpoint version.incrementAndGet(); cache.setSnapshot(GROUP, Snapshot.create(ImmutableList.of(cluster), ImmutableList.of(endpoint1), ImmutableList.of(listener), ImmutableList.of(), ImmutableList.of(), version.toString())); - final ListenerRoot listenerRoot = xdsBootstrap.listenerRoot("my-listener"); - await().untilAsserted(() -> {xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java (1)
284-284: Thread-safety concern:revisionshould bevolatile.Consistent with the concerns on other recorder fields,
revisionis written insnapshotUpdated()and read by the Gauge lambda. It should bevolatileto ensure visibility across threads.- private long revision; + private volatile long revision;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsLoadBalancerLifecycleObserverTest.java(1 hunks)xds/src/main/java/com/linecorp/armeria/xds/BootstrapClusters.java(2 hunks)xds/src/main/java/com/linecorp/armeria/xds/DefaultSubscriptionContext.java(2 hunks)xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java(1 hunks)xds/src/main/java/com/linecorp/armeria/xds/StaticSubscriptionContext.java(2 hunks)xds/src/main/java/com/linecorp/armeria/xds/SubscriptionContext.java(1 hunks)xds/src/main/java/com/linecorp/armeria/xds/XdsBootstrapImpl.java(2 hunks)xds/src/main/java/com/linecorp/armeria/xds/XdsClusterManager.java(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java
⚙️ CodeRabbit configuration file
**/*.java: - The primary coding conventions and style guide for this project are defined insite/src/pages/community/developer-guide.mdx. Please strictly adhere to this file as the ultimate source of truth for all style and convention-related feedback.2. Specific check for
@UnstableApi
- Review all newly added public classes and methods to ensure they have the
@UnstableApiannotation.- However, this annotation is NOT required under the following conditions:
- If the class or method is located in a package containing
.internalor.testing.- If the class or method is located in a test source set.
- If a public method is part of a class that is already annotated with
@UnstableApi.
Files:
it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsLoadBalancerLifecycleObserverTest.javaxds/src/main/java/com/linecorp/armeria/xds/SubscriptionContext.javaxds/src/main/java/com/linecorp/armeria/xds/StaticSubscriptionContext.javaxds/src/main/java/com/linecorp/armeria/xds/DefaultSubscriptionContext.javaxds/src/main/java/com/linecorp/armeria/xds/BootstrapClusters.javaxds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.javaxds/src/main/java/com/linecorp/armeria/xds/XdsClusterManager.javaxds/src/main/java/com/linecorp/armeria/xds/XdsBootstrapImpl.java
🧬 Code graph analysis (1)
it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsLoadBalancerLifecycleObserverTest.java (3)
core/src/main/java/com/linecorp/armeria/common/metric/MoreMeters.java (1)
MoreMeters(42-203)junit5/src/main/java/com/linecorp/armeria/testing/junit5/server/ServerExtension.java (1)
ServerExtension(49-405)it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsResourceReader.java (1)
XdsResourceReader(31-65)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: build-ubicloud-standard-16-jdk-17-min-java-17-coverage
- GitHub Check: build-ubicloud-standard-16-jdk-17-min-java-11
- GitHub Check: build-ubicloud-standard-16-jdk-17-leak
- GitHub Check: build-ubicloud-standard-16-jdk-21-snapshot-blockhound
- GitHub Check: build-ubicloud-standard-16-jdk-11
- GitHub Check: build-ubicloud-standard-16-jdk-8
- GitHub Check: build-windows-latest-jdk-25
- GitHub Check: site
- GitHub Check: build-ubicloud-standard-16-jdk-25
- GitHub Check: build-macos-latest-jdk-25
- GitHub Check: flaky-tests
- GitHub Check: lint
- GitHub Check: Kubernetes Chaos test
- GitHub Check: Summary
🔇 Additional comments (11)
xds/src/main/java/com/linecorp/armeria/xds/SubscriptionContext.java (1)
28-31: LGTM!The new accessor methods for
meterRegistry()andmeterIdPrefix()are clean additions to the package-private interface, enabling metrics context propagation throughout the subscription flow.xds/src/main/java/com/linecorp/armeria/xds/XdsBootstrapImpl.java (1)
45-69: LGTM!The meter context (MeterIdPrefix and MeterRegistry) is properly threaded through the initialization chain to
XdsClusterManager,BootstrapClusters, andDefaultSubscriptionContext. The test constructor appropriately defaults toFlags.meterRegistry().it/xds-client/src/test/java/com/linecorp/armeria/xds/it/XdsLoadBalancerLifecycleObserverTest.java (1)
624-633: LGTM!The
measureAllhelper methods provide clean utility for filtering metrics in assertions, and the default filter appropriately scopes toarmeria.xds.lbprefix.xds/src/main/java/com/linecorp/armeria/xds/DefaultSubscriptionContext.java (1)
30-58: LGTM!The
DefaultSubscriptionContextcorrectly implements the newmeterRegistry()andmeterIdPrefix()accessors from theSubscriptionContextinterface. Fields are properly declared asprivate finaland initialized in the constructor.xds/src/main/java/com/linecorp/armeria/xds/XdsClusterManager.java (3)
54-67: LGTM!The constructor properly initializes the local load balancer with a metrics-enabled observer when conditions are met (non-empty local cluster name and locality present). The observer is correctly passed to
XdsLoadBalancer.of().
69-111: LGTM!Both
registermethod overloads consistently createDefaultXdsLoadBalancerLifecycleObserverinstances with appropriate meter context from theSubscriptionContext. This ensures metrics are properly collected for both statically registered clusters and on-demand CDS registrations.
42-42: Remove unused static field.The static
observerfield at line 42 appears to be dead code after the refactoring to useDefaultXdsLoadBalancerLifecycleObserverinstances.- private static final XdsLoadBalancerLifecycleObserver observer = new XdsLoadBalancerLifecycleObserver() {}; -xds/src/main/java/com/linecorp/armeria/xds/BootstrapClusters.java (1)
43-61: LGTM!The meter context (MeterIdPrefix, MeterRegistry) is correctly threaded through the constructor and into
initializePrimary, where it's passed toStaticSubscriptionContext. This aligns well with the broader meter propagation pattern in this PR.xds/src/main/java/com/linecorp/armeria/xds/StaticSubscriptionContext.java (1)
26-57: LGTM!The new
meterIdPrefixandmeterRegistryfields are properly stored as final and exposed via theSubscriptionContextinterface accessors. This enables meter-aware initialization for static clusters.xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java (2)
41-91: Clean observer structure.The observer correctly delegates lifecycle events to dedicated recorders, each with its own metric prefix. The
close()method properly cleans up all recorders, preventing meter leaks.
118-138: Good cleanup pattern for dynamic priority/locality changes.The pattern of comparing previous and current maps to close removed recorders is well-implemented. This ensures meters are properly deregistered when priorities are removed from the load balancer state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java (1)
184-186: Consider defensive check instead ofassert.The assertion
assert this.priority == priority;verifies a critical invariant, but assertions are often disabled in production (-daflag). If this invariant is important for correctness, consider an explicit check:void update(int priority, HostSet hostSet, LoadBalancerState state) { - assert this.priority == priority; + if (this.priority != priority) { + throw new IllegalArgumentException("Priority mismatch: expected " + this.priority + ", got " + priority); + }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java
⚙️ CodeRabbit configuration file
**/*.java: - The primary coding conventions and style guide for this project are defined insite/src/pages/community/developer-guide.mdx. Please strictly adhere to this file as the ultimate source of truth for all style and convention-related feedback.2. Specific check for
@UnstableApi
- Review all newly added public classes and methods to ensure they have the
@UnstableApiannotation.- However, this annotation is NOT required under the following conditions:
- If the class or method is located in a package containing
.internalor.testing.- If the class or method is located in a test source set.
- If a public method is part of a class that is already annotated with
@UnstableApi.
Files:
xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: build-ubicloud-standard-16-jdk-21-snapshot-blockhound
- GitHub Check: build-ubicloud-standard-16-jdk-25
- GitHub Check: build-ubicloud-standard-16-jdk-17-min-java-17-coverage
- GitHub Check: build-ubicloud-standard-16-jdk-17-min-java-11
- GitHub Check: build-ubicloud-standard-16-jdk-17-leak
- GitHub Check: build-windows-latest-jdk-25
- GitHub Check: build-ubicloud-standard-16-jdk-8
- GitHub Check: build-ubicloud-standard-16-jdk-11
- GitHub Check: build-macos-latest-jdk-25
- GitHub Check: site
- GitHub Check: flaky-tests
- GitHub Check: lint
- GitHub Check: Kubernetes Chaos test
- GitHub Check: Summary
🔇 Additional comments (6)
xds/src/main/java/com/linecorp/armeria/xds/DefaultXdsLoadBalancerLifecycleObserver.java (6)
41-61: LGTM!Clean class structure with well-organized recorder initialization. The metric prefix hierarchy is properly set up with cluster tags.
63-91: LGTM!Lifecycle callbacks are correctly delegated to the appropriate recorders, and
close()properly releases all resources.
145-150: Same visibility concern forlocalityMapfield.Similar to
priorityMapinLoadBalancerRecorder, thelocalityMapfield is reassigned inupdate()(line 208) and read inclose()(line 214) withoutvolatile. The same thread-safety consideration applies.
218-276: LGTM!Well-structured recorder with proper volatile fields for gauge backing values. The meter registration and cleanup are correctly implemented.
278-305: LGTM!The
revisionfield is correctly markedvolatilefor thread-safe gauge reads. Counter operations are inherently thread-safe. Clean implementation.
93-143: Verify thread-safety ofpriorityMapfield.The
priorityMapfield is reassigned on line 135 and read inclose()on line 141. WhilenumSubsetsis correctly markedvolatile,priorityMapis not. IfstateUpdated()andclose()can be called from different threads, the non-volatile field could cause visibility issues whereclose()observes a stale reference.If all access is guaranteed single-threaded via the xDS-dedicated
EventExecutor, this is acceptable. Otherwise, consider makingpriorityMapvolatile to ensure visibility of the reference update.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6490 +/- ##
============================================
- Coverage 74.46% 0 -74.47%
============================================
Files 1963 0 -1963
Lines 82437 0 -82437
Branches 10764 0 -10764
============================================
- Hits 61385 0 -61385
+ Misses 15918 0 -15918
+ Partials 5134 0 -5134 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation:
When using xDS based clients, users will often want to know how the xDS resources translate to the load balancer which routes requests upstream.
This changeset attempts introduce a
DefaultXdsLoadBalancerLifecycleObserverwhich collects metrics of each load balancer for each cluster.This observer has the same lifecycle as a
ClusterResourceNode, and will deregister each metric upon close.Note that thanks to
XdsClusterManager, each cluster has a unique name.Modifications:
DefaultXdsLoadBalancerLifecycleObserveris introduced which records metrics for aXdsLoadBalancerMeterRegistry,MeterIdPrefixcan now be specified inXdsBootstrapBuilderMeterRegistry,MeterIdPrefixis also added toSubscriptionContextso that they can be accessed when dynamically querying xDS resourcesEventExecutoris used for the default event loop.Result:
DefaultXdsLoadBalancerLifecycleObservernow records metrics forXdsLoadBalancers by default