chore(shard-manager): Emit metrics on total number of executors by gazi-yestemirova · Pull Request #7636 · cadence-workflow/cadence

gazi-yestemirova · 2026-01-22T22:03:49Z

What changed?
This PR adds shard_distributor_total_executors gauge metric to track the number of executors registered with the shard distributor.
The metric is emitted during each rebalance loop, with executor statuses (e.g., ExecutorStatusACTIVE, ExecutorStatusDRAINING, ExecutorStatusDRAINED.
And it is tagged with namespace and namespace_type for per-namespace monitoring.

Why?
Overall to monitor the health of the shard distributor cluster - know how many executors are actively participating in shard distribution.
To detect executor churn or scaling events
Alert when executor count falls below expected thresholds, which could indicate deployment issues or infrastructure problems.

How did you test it?
Verified metric is emitted in local and dev environments with correct executor counts.

Potential risks
N/A

Release notes

Documentation Changes

Signed-off-by: Gaziza Yestemirova <gaziza@uber.com>

arzonus · 2026-01-23T12:19:01Z

service/sharddistributor/leader/process/processor.go

 	}

 	p.emitActiveShardMetric(namespaceState.ShardAssignments, metricsLoopScope)
+	p.emitExecutorMetric(namespaceState, len(staleExecutors), metricsLoopScope)


In case of failures of AssignShards the metric will not be emitted. I think we can emit this metric right before a call of AssignShards

So, I wanted to emit the "committed" state of the executors after the transaction, because it will be retried, so we avoid potentially emitting inconsistent data.
But I think emitting "observed" state is also reasonable. Let me update it.

nit: Perhaps, we also should add it for the shadow namespaces, don't we?

yes, we do, we emit the metric before exiting for shadow executors

So emitting metric happens right before the shadow executors

service/sharddistributor/leader/process/processor.go

Signed-off-by: Gaziza Yestemirova <gaziza@uber.com>

arzonus

lgtm 🚀

chore(shard-manager): Emit metrics on total number of executors

6fc4b64

Signed-off-by: Gaziza Yestemirova <gaziza@uber.com>

gazi-yestemirova requested review from 3vilhamster, Shaddoll, davidporter-id-au, demirkayaender, dkrotx, jakobht, neil-xie, sankari165, shijiesheng and taylanisikdemir as code owners January 22, 2026 22:03

arzonus reviewed Jan 23, 2026

View reviewed changes

Add status tag & tests

8d493a8

Signed-off-by: Gaziza Yestemirova <gaziza@uber.com>

arzonus approved these changes Jan 23, 2026

View reviewed changes

gazi-yestemirova merged commit 474d530 into cadence-workflow:master Jan 23, 2026
42 of 43 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(shard-manager): Emit metrics on total number of executors#7636

chore(shard-manager): Emit metrics on total number of executors#7636
gazi-yestemirova merged 2 commits intocadence-workflow:masterfrom
gazi-yestemirova:shard-distributor-metrics

gazi-yestemirova commented Jan 22, 2026 •

edited

Loading

Uh oh!

arzonus Jan 23, 2026

Uh oh!

gazi-yestemirova Jan 23, 2026

Uh oh!

arzonus Jan 23, 2026

Uh oh!

gazi-yestemirova Jan 23, 2026

Uh oh!

gazi-yestemirova Jan 23, 2026

Uh oh!

Uh oh!

Uh oh!

arzonus left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gazi-yestemirova commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arzonus Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

gazi-yestemirova Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

arzonus Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

gazi-yestemirova Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

gazi-yestemirova Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

arzonus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gazi-yestemirova commented Jan 22, 2026 •

edited

Loading