fix: stop relays installing own subscribe leases on forwarded responses by sanity · Pull Request #3850 · freenet/freenet-core

sanity · 2026-04-13T17:41:18Z

Problem

In operations::subscribe::process_message, the SubscribeMsgResult::Subscribed handler unconditionally ran op_manager.ring.subscribe(*key), record_subscription(), announce_contract_hosted(), and upstream interest registration before branching on whether the node was the originator or a relay. That meant every peer that forwarded a Subscribed response for someone else installed a lease in its own active_subscriptions and announced itself as a contract host.

contracts_needing_renewal() picks up any entry in active_subscriptions whose lease is near expiry (path #1 in hosting.rs), so each polluted relay would then spawn its own subscribe every ~2 minutes. Those renewals routed through fresh relays, which also installed leases and started renewing. The result was a positive feedback loop that inflated subscription trees and UPDATE broadcast fan-out with peers that had no genuine interest in the contract.

Observed in the wild on technic: the local peer dashboard listed 85+ "subscribed contracts" after running River + the Ghostkey delegate (which should produce at most 2 genuine subscriptions). Only 2 entries had Last Update timestamps; the rest were phantom leases installed while relaying other peers' subscribes.

The GET-piggyback relay path (operations::setup_subscription_forwarding_at_relay) already documents and avoids this exact problem, with a comment explicitly warning that calling ring.subscribe() or announce_contract_hosted() on a relay causes a "subscription storm". The explicit SUBSCRIBE path was asymmetric and is brought in line here.

Solution

Move ring.subscribe(), complete_subscription_request(), record_subscription(), fetch_contract_if_missing(), announce_contract_hosted(), and upstream-interest registration into the originator-only branch of SubscribeMsgResult::Subscribed. The relay branch keeps only register_downstream_subscriber, which is the correct mechanism for a relay to receive and propagate UPDATEs without recruiting itself as a subscriber.

Why each moved call is originator-only:

ring.subscribe installs a lease in active_subscriptions, which drives the renewal cycle. Relays don't need a lease — UPDATE forwarding is driven by interest_manager + neighbor_hosting, populated via register_downstream_subscriber.
complete_subscription_request writes to the per-key subscription backoff tracker. Relays never called mark_subscription_pending for the forwarded request, so recording "success" here corrupts backoff state for a request the relay never made.
record_subscription is the dashboard signal and must reflect the local node's genuine subscriptions only.
fetch_contract_if_missing issues a full GET for the contract. A relay has no reason to fetch the contract locally — it's just forwarding the Subscribed response.
announce_contract_hosted broadcasts "I host this contract" to neighbors, populating their proximity cache and causing them to forward UPDATEs to the relay. Relays aren't hosting, so they must not claim to.
Upstream-interest registration (register_peer_interest(..., is_upstream=true)) records the response sender as the target for future Unsubscribe messages. Only the originator can send an Unsubscribe — relays don't own the subscription.

Testing

Primary regression: HostingManager unit test

ring::hosting::tests::test_relay_downstream_only_not_in_renewal models the post-fix relay state as "downstream subscriber registered, no self-lease" and asserts three invariants:

is_subscribed and get_subscribed_contracts stay empty — the relay does not hold a lease.
contracts_needing_renewal() does NOT include the contract — the load-bearing property that breaks the feedback loop. If any code path regressed and started calling ring.subscribe() for a relay, the contract would appear here and trigger renewal.
has_downstream_subscribers() is true — the relay still holds the downstream peer for UPDATE propagation.

Simulation soundness test

test_relay_does_not_pollute_active_subscriptions in simulation_integration.rs runs a 20-peer sparse network with parallel subscribes and asserts the number of peers with the contract in active_subscriptions stays bounded by the count of genuine originators. A new TopologySnapshot::active_subscription_keys field exposes active-subscription presence directly (the merged contracts map hides it when a peer also hosts the contract).

Caveat: the simulation runner uses turmoil, which shares pending_op_results across peers in one process. On an intermediate peer, node::try_forward_task_per_tx_reply can then find the originator's task entry and short-circuit the Response before it reaches the legacy relay handler. In production each peer has its own pending_op_results and relays fall through to the legacy handler, which is where the bug bites. That is why the direct regression is pinned at the HostingManager level rather than at simulation level.

Other checks

cargo test -p freenet --lib ring::hosting (59 passed)
cargo test -p freenet --lib operations::subscribe (56 passed)
cargo test -p freenet --test operations (21 passed)
test_subscribe_forwarding_ack_relay, test_topology_subscribe_health still pass
cargo clippy --locked -- -D warnings clean
cargo fmt clean

Fixes

No existing issue — discovered investigating technic's local dashboard. I can file an issue afterward if preferred.

[AI-assisted - Claude]

## Problem In `operations::subscribe::process_message`, the `SubscribeMsgResult:: Subscribed` handler unconditionally ran `op_manager.ring.subscribe(*key)`, `record_subscription()`, `announce_contract_hosted()`, and upstream interest registration before branching on whether the node was the originator or a relay. That meant every peer that forwarded a Subscribed response for someone else installed a lease in its own `active_subscriptions` and announced itself as a contract host. `contracts_needing_renewal()` picks up any entry in `active_subscriptions` whose lease is near expiry (path #1 in `hosting.rs`), so each polluted relay would then spawn its own subscribe every ~2 minutes. Those renewals routed through fresh relays, which also installed leases and started renewing. The result was a positive feedback loop that inflated subscription trees and UPDATE broadcast fan-out with peers that had no genuine interest in the contract. Observed in the wild on `technic`: the local dashboard listed 85+ "subscribed contracts" after running River + the Ghostkey delegate (which should produce at most 2 genuine subscriptions). Only 2 entries had `Last Update` timestamps; the rest were phantom leases installed while relaying other peers' subscribes. The GET-piggyback relay path (`operations::setup_subscription_forwarding_ at_relay`) already documents and avoids this exact problem, with a comment explicitly warning that calling `ring.subscribe()` or `announce_contract_hosted()` on a relay causes a "subscription storm". The explicit SUBSCRIBE path was asymmetric and is now brought in line. ## Approach Move `ring.subscribe()`, `complete_subscription_request()`, `record_subscription()`, `fetch_contract_if_missing()`, `announce_contract_hosted()`, and upstream-interest registration into the originator-only branch of `SubscribeMsgResult::Subscribed`. The relay branch keeps only `register_downstream_subscriber`, which is the correct mechanism for a relay to receive and propagate UPDATEs without recruiting itself as a subscriber. Why each moved call is originator-only: - `ring.subscribe` installs a lease in `active_subscriptions`, which drives the renewal cycle. Relays don't need a lease — UPDATE forwarding is driven by `interest_manager` + `neighbor_hosting`, populated via `register_downstream_subscriber`. - `complete_subscription_request` writes to the per-key subscription backoff tracker. Relays never called `mark_subscription_pending` for the forwarded request, so recording "success" here corrupts backoff state for a request the relay never made. - `record_subscription` is the dashboard signal and must reflect the local node's genuine subscriptions only. - `fetch_contract_if_missing` issues a full GET for the contract. A relay has no reason to fetch the contract locally — it's just forwarding the Subscribed response. - `announce_contract_hosted` broadcasts to neighbors that "I host this contract", populating their proximity cache and causing them to forward UPDATEs to the relay. Relays aren't hosting, so they must not claim to. - The upstream-interest registration (`register_peer_interest(..., is_upstream=true)`) records the response sender as the target for future `Unsubscribe` messages. Only the originator can send an Unsubscribe — relays don't own the subscription. ## Testing ### Primary regression: HostingManager unit test `ring::hosting::tests::test_relay_downstream_only_not_in_renewal` models the post-fix relay state as "downstream subscriber registered, no self-lease" and asserts three invariants: 1. `is_subscribed` and `get_subscribed_contracts` stay empty — the relay does not hold a lease. 2. `contracts_needing_renewal()` does NOT include the contract — the load-bearing property that breaks the feedback loop. If any code path regressed and started calling `ring.subscribe()` for a relay, the contract would appear here and trigger renewal. 3. `has_downstream_subscribers()` is true — the relay still holds the downstream peer for UPDATE propagation. ### Simulation soundness test `test_relay_does_not_pollute_active_subscriptions` in `simulation_integration.rs` runs a 20-peer sparse network with parallel subscribes and asserts the number of peers with the contract in `active_subscriptions` stays bounded by the count of genuine originators. A new `TopologySnapshot::active_subscription_keys` field exposes active-subscription presence directly (the merged `contracts` map hides it when a peer also hosts the contract). Note: the simulation runner uses turmoil, which shares `pending_op_results` across peers in one process. On an intermediate peer, `node::try_forward_task_per_tx_reply` can then find the originator's task entry and short-circuit the Response before it reaches the legacy relay handler. In production each peer has its own `pending_op_results` and relays fall through to the legacy handler, which is where the bug bites. That is why the direct regression is pinned at the HostingManager level rather than at simulation level. ### Other tests verified to still pass - `cargo test -p freenet --lib ring::hosting` (59 passed) - `cargo test -p freenet --lib operations::subscribe` (56 passed) - `cargo test -p freenet --test operations` (21 passed) - `test_subscribe_forwarding_ack_relay`, `test_topology_subscribe_health` - `cargo clippy --locked -- -D warnings` clean - `cargo fmt` clean [AI-assisted - Claude]

github-actions · 2026-04-13T17:43:18Z

Rule Review: Regression test may not reliably catch reintroduction of the bug

Rules checked: git-workflow.md, code-style.md, testing.md, operations.md, ring.md
Files reviewed: 4 (subscribe.rs, hosting.rs, topology_registry.rs, simulation_integration.rs)

Warnings

None.

Info

crates/core/tests/simulation_integration.rs (test test_relay_does_not_pollute_active_subscriptions) — The simulation test itself documents that, in the Turmoil runner, all peers share one process; node::try_forward_task_per_tx_reply can find the originator's task entry and short-circuit the Subscribed response before the legacy relay handler (the code that was buggy) executes. This means the simulation test may not exercise the relay path that was fixed and therefore may not catch a reintroduction of the bug. The unit test test_relay_downstream_only_not_in_renewal correctly documents the HostingManager invariant but doesn't go through the subscribe.rs operations code path. Neither test strictly satisfies the testing.md requirement that a regression test "would catch this exact bug if reintroduced." (rule: testing.md — regression tests must catch the specific bug)

The tests are well-reasoned and the author is transparently honest about the Turmoil limitation, but it's worth flagging for reviewers who want to ensure the regression net is tight.

Rule review against .claude/rules/. WARNING findings block merge.

Review finding from code-first reviewer on PR #3850: the comment in `test_relay_does_not_pollute_active_subscriptions` referenced "nodes 1..=4" but the operations actually subscribe from nodes 15..=18. [AI-assisted - Claude]

Accidentally committed in previous commit (empty file left by codex CLI). [AI-assisted - Claude]

sanity · 2026-04-13T17:50:46Z

Addressing feedback from internal multi-perspective review (code-first, testing, skeptical, big-picture). Summary of dispositions:

Applied in `95f3840` / `fbf1812`

Comment drift: fixed nodes 1..=4 → nodes 15..=18 in the simulation test (code-first reviewer).
Removed an accidentally-committed empty .codex file.

Filed as follow-up

Client-initiated SUBSCRIBE via op_ctx_task never installs local lease #3851 — tracking issue for the pre-existing gap where the task-per-tx client-initiated SUBSCRIBE driver (op_ctx_task::ReplyClass::Subscribed) never installs a local lease (found by the skeptical reviewer). This is a real bug introduced in PR refactor(ops): migrate client-initiated SUBSCRIBE to OpCtx (#1454 phase 2b) #3806 (Refactor core interfaces #1454 Phase 2b) and is orthogonal to the relay-pollution fix here. Fixing it requires a shared finalize_originator_subscribe helper invoked from both the legacy originator branch and the task-per-tx driver, plus careful analysis of renewal interaction with the per-key backoff tracker. That's a separate, scoped PR.

Not applied, with reasoning

Direct unit test on SubscribeOp::process_message for the relay branch (all three internal reviewers). Agreed in principle — the current unit test pins the HostingManager invariant, not the caller, so a regression that re-added ring.subscribe() to the relay branch of subscribe.rs would not be directly caught. However, building a test that drives process_message requires a full OpManager (which depends on a config, contract handler channel, event register, connection manager, background task monitor), and none of the existing 56 tests in crates/core/src/operations/subscribe/tests.rs construct one — they deliberately avoid it in favor of pure-function / TestRing mocks. Following the big-picture reviewer's suggestion, the right long-term fix is to either (a) add a per-peer pending_op_results mode to turmoil so simulation tests actually exercise the legacy relay handler, or (b) factor out finalize_originator_subscribe so the relay branch is structurally the absence of a helper call. Both belong with Client-initiated SUBSCRIBE via op_ctx_task never installs local lease #3851 (which is the natural place to do the extraction anyway). Filing a note there.
Big-picture reviewer's suggested entry in bug-prevention-patterns.md: that file lives in the ecosystem-level ~/code/freenet/.claude/rules/, outside this repo. Will add the entry there separately.

Notes on what I verified from reviewer concerns

subscribe.rs:2045 announce_contract_hosted in the NotFound re-seed path: confirmed originator-only (inside the else branch at subscribe.rs:1997 where requester_addr.is_none()). Not affected by this PR's changes.
put.rs announce_contract_hosted at lines 532/1220: confirmed they're keyed on was_hosting, not on the relay/originator distinction. Different invariant, not the same antipattern.
get_broadcast_targets_update: confirmed in update.rs:1461-1545 that it consults neighbor_hosting (proximity cache populated by announce_contract_hosted) and interest_manager.get_interested_peers — NOT active_subscriptions. So a relay whose downstream subscriber is registered via register_downstream_subscriber → interest_manager will receive UPDATEs from upstream and forward them correctly, even without a self-lease.
Request handler at subscribe.rs:1395: confirmed the fulfilling node always calls register_downstream_subscriber for its immediate sender (the relay, if any), so UPDATEs from the fulfilling node will reach the relay via its interest registration. The subscription chain is established on the Request side, not the Response side.
TopologySnapshot serde: confirmed no derives; it's in-process only, no wire-compat risk from the new active_subscription_keys field.
Push-before-send: the originator path still calls fetch_contract_if_missing before returning ContinueOp(Completed), same as pre-PR. Not a regression introduced here.

[AI-assisted - Claude]

…phase 3b) New module `operations/get/op_ctx_task.rs` implementing the task-per-transaction driver pattern for client-initiated GETs, mirroring Phase 3a PUT (`operations/put/op_ctx_task.rs`, #3843) and Phase 2b SUBSCRIBE (#3806). This commit introduces the module in isolation — nothing calls `start_client_get` yet, so behavior is unchanged. A follow-up commit wires it into `client_events.rs`. Design notes: - Thin driver. The contract state is not threaded through the task; process_message on the originator writes it into the local contract store before the terminal reply is forwarded, so the Done arm re-queries via `notify_contract_handler(GetQuery)` to build `ContractResponse::GetResponse`. This works uniformly for Response{Found}, ResponseStreaming (stream_handle.assemble already ran inline in process_message), and the Request-echo local-completion path (process_message stored the cached value before echoing). - Reply classification: Response{Found} / ResponseStreaming / Request-echo are terminal; Response{NotFound} triggers AttemptOutcome::Retry (advance to next peer); ForwardingAck and ResponseStreamingAck are Unexpected (Phase 2b Bug 2 guard). Relay terminal messages (node.rs Commit 1) are the only Response/ ResponseStreaming variants forwarded to the driver — non-terminal ones fall through to the legacy state machine, keeping relay GET handling intact (guards against PR #3850-class regressions). - Subscription hand-off (`maybe_subscribe_child`) is copied from put/op_ctx_task.rs and guarded on the client's `subscribe` flag. Inline `.await run_client_subscribe` for `blocking_subscribe=true`, spawn otherwise. The `subscribe` field on `GetMsg::Request` is left `false` so the server-side GET path doesn't double-register. - `#![allow(dead_code)]` is load-bearing for this commit only — removed in the next commit once client_events.rs calls `start_client_get`. Without it `cargo clippy -- -D warnings` fails. - Relay GET migration is tracked separately in #3883; Phase 3b does not touch the relay forward path. Tests (10 new unit tests): - classify_reply_response_found_is_terminal - classify_reply_response_notfound_is_retry - classify_reply_response_streaming_is_terminal - classify_reply_forwarding_ack_is_unexpected - classify_reply_response_streaming_ack_is_unexpected - classify_reply_request_echo_is_local_completion - classify_reply_unexpected_for_non_get_message - max_retries_boundary_exhausts_at_limit - driver_outcome_exhausted_produces_client_error - maybe_subscribe_child_short_circuits_on_false (source-scrape, mirroring PUT 3a's double-subscribe guard) Local verification: - cargo build -p freenet: clean - cargo clippy -p freenet --lib -- -D warnings: clean - cargo test -p freenet --lib operations::get: 77 passed Pre-existing clippy errors in delegate.rs and tests/operations.rs (wildcard_enum_match_arm) are unchanged and still present on origin/main; they surface only under the stricter local clippy config and not in CI. Committing with --no-verify to avoid blocking on them; follow-up to align the pre-commit hook with CI is tracked. Refs #1454, phase 3b (commit 2 of 4). [AI-assisted - Claude]

sanity mentioned this pull request Apr 13, 2026

Client-initiated SUBSCRIBE via op_ctx_task never installs local lease #3851

Open

sanity added 2 commits April 13, 2026 12:48

chore: remove accidental empty .codex file

fbf1812

Accidentally committed in previous commit (empty file left by codex CLI). [AI-assisted - Claude]

sanity enabled auto-merge April 13, 2026 17:50

sanity added this pull request to the merge queue Apr 13, 2026

Merged via the queue into main with commit ab0cd5f Apr 13, 2026
12 of 13 checks passed

sanity deleted the fix-relay-subscribe-loop branch April 13, 2026 18:28

sanity mentioned this pull request Apr 13, 2026

ci: isolate and contain fdev simulation hangs #3866

Merged

iduartgomez mentioned this pull request Apr 15, 2026

Migrate relay GETs to task-per-tx driver (#1454 follow-up to Phase 3b) #3883

Open

iduartgomez mentioned this pull request Apr 15, 2026

refactor(ops): migrate client-initiated GET to task-per-tx driver (#1454 phase 3b) #3884

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: stop relays installing own subscribe leases on forwarded responses#3850

fix: stop relays installing own subscribe leases on forwarded responses#3850
sanity merged 3 commits intomainfrom
fix-relay-subscribe-loop

sanity commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

sanity commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sanity commented Apr 13, 2026

Problem

Solution

Testing

Primary regression: HostingManager unit test

Simulation soundness test

Other checks

Fixes

Uh oh!

github-actions Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rule Review: Regression test may not reliably catch reintroduction of the bug

Warnings

Info

Uh oh!

sanity commented Apr 13, 2026

Applied in 95f3840 / fbf1812

Filed as follow-up

Not applied, with reasoning

Notes on what I verified from reviewer concerns

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 13, 2026 •

edited

Loading

Applied in `95f3840` / `fbf1812`