Skip to content

refactor: align QBFT quorum with ssv-spec (2f + 1)#966

Open
shane-moore wants to merge 5 commits intosigp:unstablefrom
shane-moore:refactor/quorum-size
Open

refactor: align QBFT quorum with ssv-spec (2f + 1)#966
shane-moore wants to merge 5 commits intosigp:unstablefrom
shane-moore:refactor/quorum-size

Conversation

@shane-moore
Copy link
Copy Markdown
Member

@shane-moore shane-moore commented Apr 20, 2026

Problem, Evidence, and Context

Anchor's QBFT default quorum used N − f, with no spec basis. SSV-spec defines quorum as 2f + 1 via CommitteeMember.HasQuorum (ssv-spec/types/operator.go:27-33), and Go-SSV uses 2f + 1 strictly throughout (ssvshare.go:228-258). The two formulas coincide at canonical N = 3f + 1 sizes and diverge elsewhere, but the spec contract is 2f + 1 in both cases, while Anchor's N − f was an internal invention.

This also left three places computing BFT math: qbft::config::get_f, message_validator::get_f (flagged in-file as a TODO centralize), and ssv_types::Cluster::get_f, plus a second quorum formula in message_validator::compute_quorum_size (which already used 2f + 1, making the quorum rule internally inconsistent).

  • The N − f default was introduced 2025-01-14 in 9389ccd without a comment or design note. No evidence it was a deliberate workaround for any data-model constraint (Anchor derives f from committee_members.len(); 2f + 1 and N − f are both pure functions of that and neither requires a stored FaultyNodes field as ssv-spec has).
  • The SSV Network smart contract (ssvlabs/ssv-network/contracts/libraries/ValidatorLib.sol:validateOperatorsLength) rejects non-canonical operatorIds.length on-chain at registerValidator. Operator counts are enforced to {4, 7, 10, 13} via MIN_OPERATORS_LENGTH = 4, MAX_OPERATORS_LENGTH = 13, length % 3 == 1. So the two formulas are observationally identical in production; the divergence only matters in test harnesses and would be a latent footgun for any future flow that bypasses event ingestion.
  • Diego's review (threads r3133635561, r3133348690, r3133751308, r3133780108, r3133768791, r3133635550) called this out in full and is the motivation for the expanded scope.

Change Overview

Quorum formula alignment (anchor/common/ssv_types/src/lib.rs)

  • quorum_size(n) now returns 2f + 1 (previously N − f). Doc comment ties it explicitly to ssv-spec's CommitteeMember.GetQuorum and Go-SSV's ComputeQuorumAndPartialQuorum.
  • New is_valid_committee_size(n) mirrors Go-SSV's ValidCommitteeSize: N = 3f + 1 with f in 1..=4, i.e. {4, 7, 10, 13}.
  • get_f unchanged ((N − 1) / 3 with saturating_sub guard).

QBFT config (anchor/common/qbft/src/config.rs, error.rs)

  • ConfigBuilder::build now rejects non-canonical committee sizes via is_valid_committee_size (Diego's r3133780108).
  • Vestigial [2f + 1, N − f] range validation removed. There was no with_quorum_size setter anywhere, so the range was never exercised (Diego's r3133751308).
  • ConfigBuilder::quorum_size and Config::quorum_size are now derived getters (no cached field). Single source of truth; can't drift from committee_members.
  • ConfigBuilderError::InvalidQuorumSize renamed to InvalidCommitteeSize to reflect what's actually validated. Display: "Committee size must be of the form 3f + 1 (4, 7, 10, or 13)".

Consolidation in message_validator and eth

  • message_validator::compute_quorum_size wrapper deleted (Diego's r3133768791). Its single production call site (consensus_message::validate_consensus_message_semantics) now uses ssv_types::quorum_size directly. Previously, compute_quorum_size already used 2f + 1 while ssv_types::quorum_size used N − f, meaning Anchor had two different quorum rules internally; they now share one.
  • eth::util::validate_operators inline canonical check replaced with a call to ssv_types::is_valid_committee_size. Canonical-size policy now lives in exactly one place.

Tests

  • New #[cfg(test)] mod tests at the bottom of ssv_types::lib.rs:
    • get_f_matches_byzantine_formula (table at {0, 3, 4, 7, 10, 13}; locks saturating_sub, catches N/3 vs (N − 1)/3 regression).
    • quorum_size_is_two_f_plus_one (table at {0, 1, 3, 4, 6, 7, 10, 13}; catches silent N − f substitution for 2f + 1).
    • quorum_size_composes_with_get_f (property assertion across 0..=MAX_SIGNATURES).
    • valid_committee_size_matches_ssv_canonical_sizes.
  • New qbft::config::tests module: build_accepts_canonical_committee_sizes, build_rejects_empty_committee, build_rejects_non_canonical_committee_sizes, builder_quorum_size_is_derived_from_committee_size.
  • qbft integration tests (qbft::tests) migrated from N=5 and N=3 to N=4 (smallest canonical). Test intent unchanged; only the committee size and expected consensus count shift with it.

What intentionally did not change

  • get_f formula and its saturating_sub guard on empty committees.
  • Cluster::get_f (BLS Lagrange threshold path via validator_store remains 2f + 1).
  • Any on-wire behavior or event format. No runtime behavior change at canonical sizes.
  • The qbft Config struct's internal layout aside from the quorum_size field removal.

Risks, Trade-offs, and Mitigations

  • Observable behavior change at non-canonical sizes only. ConfigBuilder::build previously accepted any committee size; now rejects non-canonical with InvalidCommitteeSize. Production is unreachable (on-chain + eth-event layer both reject non-canonical). The qbft Config::quorum_size value for any hypothetical non-canonical input changes from N − f to 2f + 1. Mitigation: three-layer defense (contract, eth ingestion, qbft build) makes the non-canonical path unreachable except through test harnesses, which are updated here.
  • API rename: InvalidQuorumSizeInvalidCommitteeSize. Grep of the workspace found no external matches! or match callers on this variant; the only matchers are in this PR's new tests.
  • compute_quorum_size deletion. pub(crate)-scoped, one production caller (consensus_message.rs:80), no external consumers.
  • Test migration from non-canonical Ns. qbft integration tests were using N=3 and N=5 as a matter of convenience (TestQBFTCommitteeBuilder::default had N=5). Moving to N=4 loses the weak signal that those tests gave on non-canonical behavior, but that signal was unused in any production path and the qbft tests continue to exercise the same scenarios (basic consensus, faulty operators, node recovery, round change, leader waiting).

Validation

  • cargo test -p ssv_types -p qbft -p message_validator -p qbft_manager -p eth: all green (57 + 9 + 16 + 80 + 16).
  • cargo clippy --all-targets -- -D warnings on all touched crates: clean.
  • make cargo-fmt-check: clean.
  • Cross-checked with ssv-spec (types/operator.go:27-33) and Go-SSV (protocol/v2/types/ssvshare.go:228-267) to confirm the 2f + 1 formula and ValidCommitteeSize semantics match.
  • Cross-checked with the SSV Network contract (ssvlabs/ssv-network at stage-v2.0.0-upgrade7) to confirm the canonical-size enforcement happens on-chain.

Rollback

Single focused branch. Revert cleanly with git revert. No config, data, or on-wire impact. Downstream branches that depend on compute_quorum_size or InvalidQuorumSize would need to be rebased after revert (none known).

Blockers / Dependencies

  • Unblocks test: add CommitteeMemberTest spec coverage #958 (CommitteeMemberTest spec test). That branch already uses ssv_types::quorum_size to check quorum. With this PR, it's now semantically aligned with Go's CommitteeMember.HasQuorum (both 2f + 1) rather than accidentally aligned by virtue of all fixtures being N=4.

@shane-moore

This comment was marked as outdated.

@claude-code-actions-sigp

This comment was marked as outdated.

@shane-moore

This comment was marked as outdated.

@claude-code-actions-sigp

This comment was marked as outdated.

Comment thread anchor/common/ssv_types/src/lib.rs
Comment thread anchor/message_validator/src/lib.rs Outdated
@shane-moore shane-moore force-pushed the refactor/quorum-size branch from 517d643 to 890875f Compare April 23, 2026 16:03
Comment thread anchor/common/qbft/src/config.rs Outdated
}

// Validate `quorum_size`: must fall in `[2f + 1, quorum_size(N)]`.
// Validate `quorum_size`: must fall in `[2f + 1, N − f]`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From where is this requirement?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate a bit: this is not just a comment issue, the code is actually accepting any quorum_size in the interval [2f + 1, N - f].

I don’t think that matches the upstream contract. Go SSV / ssv-spec define quorum as exactly 2f + 1, not as a range of valid thresholds. So this builder is preserving an Anchor-local notion that multiple quorum values are acceptable for the same committee size.

Also, there does not seem to be a public with_quorum_size(...) setter anymore, which makes this even harder to reason about: why is quorum treated as an independently validated field here instead of being derived from the committee size?

I’d expect this to enforce one exact quorum rule for the committee, not a lower/upper bound interval.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also validate committee_size itself here to match the upstream Go/SSV contract.

Go SSV does not treat arbitrary committee sizes as valid QBFT input; it expects canonical N = 3f + 1 sizes. Right now this builder validates quorum against committee_size, but it does not validate that the committee size is canonical in the first place.

If we want Go/spec parity, this check should live here at the config/build boundary rather than implicitly allowing non-canonical committee sizes through.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good pushes - all three points here addressed together:

Range check ([2f + 1, N - f]): no documented rationale, and since there was never a public with_quorum_size(...) setter, the interval was unreachable in practice. Removed entirely.

Quorum derivation: dropped the cached quorum_size field from Config/ConfigBuilder. quorum_size() is now a derived getter from committee_members.len(), routed through the shared ssv_types::quorum_size helper (aligned with ssv-spec's 2f + 1). It can't drift from the committee size anymore.

Canonical committee size validation: added at the config/build boundary via ssv_types::is_valid_committee_size (canonical N = 3f + 1, f in 1..=4), returning a new InvalidCommitteeSize variant. Also consolidated eth::util::validate_operators onto the same helper (it had its own inline check), so canonical-size policy lives in exactly one place. Three-layer defense: SSV Network contract -> eth event ingestion -> qbft config build.

Comment thread anchor/common/ssv_types/src/lib.rs Outdated

// Regression targets: silent `2f + 1` substitution for `N − f`.
#[test]
fn quorum_size_is_n_minus_f() {
Copy link
Copy Markdown
Member

@diegomrsantos diegomrsantos Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to add coverage here. I think these expectations should be updated to match the upstream quorum rule instead of locking in the current behavior.

Go SSV and ssv-spec define quorum as 2f + 1, so if quorum_size() is the shared helper, the cases here should assert that directly. For example: N=3 -> 1, N=4 -> 3, N=6 -> 3, N=7 -> 5.

As written, this test suite cements the wrong contract in ssv_types.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. quorum_size() is now 2f + 1, and the test asserts exactly the values you called out:

(3, 1), (4, 3), (6, 3), (7, 5), (10, 7), (13, 9)

Plus a quorum_size_composes_with_get_f property test across 0..=MAX_SIGNATURES so a future inline refactor can't silently decouple the two.

Comment thread anchor/common/ssv_types/src/lib.rs Outdated
members.saturating_sub(1) / 3
}

/// Default QBFT quorum: `N − f`. Equivalent to `2f + 1` when `N = 3f + 1`
Copy link
Copy Markdown
Member

@diegomrsantos diegomrsantos Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think quorum_size() should match the upstream Go/spec quorum rule rather than preserve Anchor’s local N - f default.

Current Go SSV and ssv-spec model quorum as 2f + 1, and message_validator::compute_quorum_size() already does that in this repo. Defining the shared ssv_types::quorum_size() helper as N - f makes the canonical helper disagree with both upstream and existing Anchor validation logic.

I think this should be:

pub fn quorum_size(committee_size: usize) -> usize {
    get_f(committee_size) * 2 + 1
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied verbatim. quorum_size is now get_f(committee_size) * 2 + 1, with a doc comment citing ssv-spec's CommitteeMember.GetQuorum and Go-SSV's ComputeQuorumAndPartialQuorum.

Comment thread anchor/message_validator/src/lib.rs Outdated
@@ -1059,8 +1060,7 @@ pub fn sync_committee_period(
}

pub(crate) fn compute_quorum_size(committee_size: usize) -> usize {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once ssv_types::quorum_size() is corrected to match Go/ssv-spec, I think this local helper should go away and the call sites here should use the shared helper directly.

Right now the actual quorum formula is still duplicated in message_validator, which leaves us with two places to keep in sync for the same protocol rule.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. compute_quorum_size and its test are gone. Its one call site in consensus_message::validate_consensus_message_semantics now calls ssv_types::quorum_size directly. Single authoritative quorum formula in the workspace.

@shane-moore shane-moore changed the title refactor: get_f and quorum_size centralized refactor: align QBFT quorum with ssv-spec (2f + 1) Apr 24, 2026
@shane-moore
Copy link
Copy Markdown
Member Author

@claude, please take a look

@claude-code-actions-sigp
Copy link
Copy Markdown

claude-code-actions-sigp Bot commented Apr 24, 2026

Claude finished @shane-moore's task in 1m 54s —— View job


Review of PR #966 (final state, all 5 commits)

  • Read all changed files and gather context
  • Analyze correctness against ssv-spec
  • Check for issues (logic, safety, tests)
  • Post review feedback

Reviewed the final state across all 10 changed files. The refactor is clean and correct.

Correctness verified:

  • quorum_size is 2f + 1 — matches ssv-spec CommitteeMember.GetQuorum and Go-SSV ComputeQuorumAndPartialQuorum.
  • is_valid_committee_size correctly accepts {4, 7, 10, 13} and rejects everything else (verified edge cases: 0, 1, 3, 16 by hand).
  • get_f unchanged with saturating_sub guard — safe for empty inputs.
  • ConfigBuilder::build now validates canonical committee size before constructing Config, with quorum_size() derived on each call from committee_members.len() — can't drift.
  • message_validator::compute_quorum_size and message_validator::get_f properly removed; the single call site in consensus_message.rs:81 now uses ssv_types::quorum_size directly.
  • eth::util::validate_operators consolidated onto is_valid_committee_size — canonical-size policy in one place.
  • Cluster::get_f correctly delegates to the centralized crate::get_f.
  • qbft integration tests migrated from N=5/N=3 to N=4 (smallest canonical). Consensus count assertions updated accordingly (4 for full, 3 for f-faulty).
  • qbft_manager tests replaced CommitteeSize::get_f() with centralized get_f/quorum_size — no behavioral change at canonical sizes.

Tests are thorough:

  • get_f_matches_byzantine_formula — catches N/3 vs (N-1)/3 regression at N=3.
  • quorum_size_is_two_f_plus_one — catches N-f substitution at non-canonical sizes (e.g. N=3: 2f+1=1 vs N-f=3, N=6: 2f+1=3 vs N-f=5).
  • quorum_size_composes_with_get_f — property test across full range.
  • Config builder tests exercise both acceptance and rejection paths for canonical/non-canonical sizes.

No issues found. LGTM.


@diegomrsantos
Copy link
Copy Markdown
Member

Thanks for all the updates here.

One meta point: I don’t think refactor is the right framing for this PR anymore. It now changes the shared quorum rule, enforces canonical committee sizes in QBFT config, and removes the old range-based quorum validation. This looks like a fix, not just a restructuring, so I think the title/description should reflect that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants