[IMPROVED] NRG: Drop proposals/pause quorum if we're being overrun#7853
Draft
MauriceVanVeen wants to merge 1 commit intomainfrom
Draft
[IMPROVED] NRG: Drop proposals/pause quorum if we're being overrun#7853MauriceVanVeen wants to merge 1 commit intomainfrom
MauriceVanVeen wants to merge 1 commit intomainfrom
Conversation
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
a9c46e6 to
d98f0dd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a protective measure to ensure we can guard against unbounded WAL growth. Currently overloaded servers could see their meta log grow well over several GBs, eventually requiring the log to be manually deleted on the server in order to recover.
The threshold is reasonably high. We keep incoming append entries cached in
n.paeand this starts logging a warning atpaeWarnThreshold: 10kand eventually caps the cache size atpaeDropThreshold: 20kat which point new entries aren't cached and need to be loaded from disk instead when they are committed. Both the above protective measures only kick in when going overpauseQuorumThreshold: 100kappend entries that haven't gotten quorum on the leader, or that have been committed but not yet applied on the follower. This difference of 'total uncommitted/unapplied entries in the log' on the leader versus 'total unapplied but committed entries in the log' on the follower should ensure under normal circumstances the leader starts dropping proposals first. If a follower is otherwise overloaded, it can also guard itself.Signed-off-by: Maurice van Veen github@mauricevanveen.com