Skip to content

Sync Committee: Heartbeat + Worker Metrics#837

Merged
zadykian merged 2 commits intomainfrom
sync-committee/metrics-healthcheck
May 5, 2025
Merged

Sync Committee: Heartbeat + Worker Metrics#837
zadykian merged 2 commits intomainfrom
sync-committee/metrics-healthcheck

Conversation

@zadykian
Copy link
Copy Markdown
Contributor

Sync Committee: Heartbeat + Worker Metrics

Refactor Worker's initialization and logging, introduce HeartbeatSender

  • Implemented a new HeartbeatSender worker updating the new heartbeat counter metric with a static interval;
  • Integrated logger and basic metrics into srv.WorkerLoop to mandatorily log unhandled errors and publish error-related metrics;

Other changes:

  • Increased the default task polling interval from 1 sec to 10 sec to reduce the number of requests between nodes;
  • Removed unused interfaces, renamed a couple of types;

@notion-workspace
Copy link
Copy Markdown

Enable SC alerts

Refactor `Worker`'s initialization and logging, introduce `HeartbeatSender`

* Implemented a new `HeartbeatSender` worker updating the new heartbeat counter metric with a static interval;
* Integrated logger and basic metrics into `srv.WorkerLoop` to mandatorily log unhandled errors and publish error-related metrics;

Other changes:

* Increased the default task polling interval from `1 sec` to `10 sec` to reduce the number of requests between nodes;
* Removed unused interfaces, renamed a couple of types;
@zadykian zadykian force-pushed the sync-committee/metrics-healthcheck branch from 82af843 to f56e4aa Compare May 2, 2025 11:02
@zadykian zadykian marked this pull request as ready for review May 2, 2025 11:24
Copy link
Copy Markdown
Contributor

@oclaw oclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but a bit of unification will make it even better :)

p.metrics.RecordError(ctx, p.Name())
err := p.updateStateIfReady(ctx)

if err == nil || errors.Is(err, context.Canceled) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some common part between workers. Let's wrap it with some runWorkerIteration func like

func runWorkerIteration(worker func() error,  label string, ignoredErrors ...err)

And put this condition inside

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, introduced srv.workerIteration

Refactor worker iteration logic and adjusted `concurrent.Suspendable`

* Introduced a unified `workerIteration` abstraction reused by `WorkerLoop`, `Aggregator` and `Proposer`
* Updated `Suspendable` behavior to handle errors properly
* Simplified `Aggregator` and `Proposer` by embedding `Suspendable` directly into them
* Adjusted logging in `Service`
@zadykian zadykian enabled auto-merge May 5, 2025 08:35
@zadykian zadykian added this pull request to the merge queue May 5, 2025
Merged via the queue into main with commit 5f7ab5c May 5, 2025
16 checks passed
@zadykian zadykian deleted the sync-committee/metrics-healthcheck branch May 5, 2025 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants