fix(shard distributor): remove heartbeat write cooldown#7513
Merged
jakobht merged 1 commit intocadence-workflow:masterfrom Dec 10, 2025
Merged
fix(shard distributor): remove heartbeat write cooldown#7513jakobht merged 1 commit intocadence-workflow:masterfrom
jakobht merged 1 commit intocadence-workflow:masterfrom
Conversation
Signed-off-by: Andreas Holt <6665487+AndreasHolt@users.noreply.github.com>
67e2990 to
d761bdc
Compare
eleonoradgr
approved these changes
Dec 8, 2025
jakobht
approved these changes
Dec 8, 2025
Member
jakobht
left a comment
There was a problem hiding this comment.
Looks great, tested locally and it indeed looks like it fixes the problem.
Thanks again for the great investigation, it's not easy finding "ghost behaviour" like this in distributed systems :)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed?
_heartbeatRefreshRateand the block that early returns in service/sharddistributor/handler/executor.go, so executor heartbeats are always persisted._heartbeatRereshRatewith a single test that expectsRecordHeartbeatto be called on a second heartbeat with the same status.Why?
**_heartbeatRefreshRate**(2s) of the last one.Two other alternatives were considered, instead of removing the check and cooldown:
_heartbeatRefreshRate(e.g., to 1s).Both of these alternatives would reduce the chance of misclassifying healthy executors as stale, but they keep a hidden coupling between heartbeat.TTL and the write cooldown. Removing the cooldown entirely makes the behavior easier to reason about and avoids this subtle issue than can happen in configration.
How did you test it?
Potential risks
_heartbeatRefreshRate.Release notes
Documentation Changes