Skip to content

Refactor core interfaces #1454

@iduartgomez

Description

@iduartgomez

Tracking issue for some internal refactoring we need to do. The goal is to:

  • improve debugging capabilities over the kernel
  • simplify and clean up some of the core code, cleaner abstractions
  • improve certain error handling, ws api client/server comms, etc.
  • add more testing around the boundaries of some components (like we did with handhsake handler, transport, etc.

Changes:

  • move towards transactions executed end-to-end asynchronously instead of the current model of execution
    • wait for subtransactions results, communicate to ws client subscribers as network messages are inbound etc. and simplify overall the execution flow and reasoning (we essentially don't need to keep the state of the ops internally anymore in the OpManager)

Design: end-to-end async transactions

This section replaces the previous [TODO: add more detail] placeholder. Line numbers are verified against main at the time of writing. Follow-up PRs implementing each phase should re-verify before making changes.

Revision history:

  • Rev 1 (2026-04-08): Initial draft. Over-promised the reach of existing scaffolding and mis-scoped Phase 1. See Errata at the end of this document.
  • Rev 2 (2026-04-08): Corrected Phase 1 after reading request_subscribe and the pure-network message dispatch path. Rev 1's Phase 1 has been split into a real prerequisite (Phase 1) and deferred work (Phase 2.5). Line numbers corrected.
  • Rev 3 (2026-04-09): After Phase 1 shipped (PR refactor(ops): extend pending_op_result callback to all op variants (#1454 phase 1) #3802), investigated SUBSCRIBE's full surface area before starting Phase 2 implementation. Found that Rev 2's Phase 2 bundles too much work in one slice. Split Phase 2 into 2a (introduce OpCtx as dormant scaffolding — no op migrated), 2b (migrate SUBSCRIBE client-initiated path only), and 2c (migrate CONNECT — deferred, schedule TBD). Phases 2.5 and 3–6 are unchanged. See the Errata (Rev 2 → Rev 3) section at the end for the specific findings that motivated the split.
  • Rev 4 (2026-04-11): After Phase 2b shipped (PR refactor(ops): migrate client-initiated SUBSCRIBE to OpCtx (#1454 phase 2b) #3806), drafted the Phase 2.5 implementation and verified it against HEAD. Found that the intended structural win is narrow (only the track_parent=true branch of start_subscription_request_internal moves; SubOperationTracker, root_ops_awaiting_sub_ops, and the async non-blocking subscribe path all stay on the legacy re-entry loop) and that the scaffolding it would introduce (a DriverRole::SubOp variant, a register_local_subscription helper to bypass NodeEvent::LocalSubscribeComplete) becomes dead weight the moment Phase 3's task-per-tx PUT/GET parent can await the child future inline. Phase 2.5 is folded into Phase 3. The blocking-subscription sub-op child migrates together with PUT/GET. The async non-blocking subscribe path (track_parent=false) also stays on the legacy loop until Phase 3. See the Errata (Rev 3 → Rev 4) section at the end.
  • Rev 5 (2026-04-11): Pre-implementation investigation of GET, PUT, and the subscribe sub-op hand-off (posted as issue comment) confirmed that PUT and GET are each large enough to merit their own slice, and that SubOperationTracker can only be deleted after BOTH callers (operations.rs:755 blocking path, invoked from PUT's start_subscription_after_put AND GET's auto_subscribe_on_get_response) are gone. Phase 3 is split into 3a / 3b / 3c: 3a migrates PUT + PUT's blocking-subscribe child (tracker stays alive for GET); 3b migrates GET + GET's blocking-subscribe child + streaming assembly inside send_and_await; 3c is a mechanical deletion PR that retires SubOperationTracker, root_ops_awaiting_sub_ops, parent_of, failed_parents, expected_sub_operations, and their 10 unit tests once both callers are gone. PUT-first is chosen because PUT's state machine is simpler (no streaming on the originator completion path), and its duplicate-send collapse (put.rs:575–594 + :841–917) is the single largest structural change in Phase 3 — validating it in isolation de-risks GET. The attempt_tx terminology referenced in earlier Rev 3/4 notes does not exist on HEAD and has been dropped. See the Errata (Rev 4 → Rev 5) section at the end.

1. Overview

Problem

Today the core runs each transaction through a centralized re-entry loop:

  1. A client request calls op_manager.push(tx, OpEnum) and returns.
  2. A NetMessage for tx arrives at p2p_protoc.
  3. handle_op_result pops the OpEnum out of an OpManager DashMap, calls Operation::process_message, and gets back an OperationResult.
  4. Depending on the variant (Completed / ContinueOp / SendAndContinue / SendAndComplete), it either marks the op completed, re-inserts it into the same DashMap (push-before-send), or forwards a message and re-inserts.
  5. Sub-transactions are tracked side-band in a separate SubOperationTracker with its own DashMaps (sub_operations, root_ops_awaiting_sub_ops, parent_of, expected_sub_operations, failed_parents). A parent op only proceeds when all_sub_operations_completed returns true.
  6. Client notifications are fan-out post-hoc via a ResultRouterSessionActor.

Pain points:

  • Transaction state lives in a global OpManager rather than in the task that logically owns it. Stack traces are useless for reasoning about "what is this transaction doing right now?"
  • Sub-operation orchestration is a state machine driven by DashMap writes and GC, not Rust's async runtime. There is no await between "parent wants a value from a child" and "child produces it."
  • Bugs in this area tend to be of the form "the parent op's DashMap entry was cleaned up before the child's callback fired", or "the parent's expected_sub_operations count disagreed with reality." These are not expressible as compile errors.
  • The client-notification path (ResultRouterSessionActor) is decoupled from the operation that produced the result, so streaming updates (e.g. incremental GET responses) are harder to reason about than they should be.

Goals (from the tracking issue)

  • Improve debugging of the kernel / transaction flow
  • Cleaner operation abstractions (operations own their own state)
  • Better error handling and WS client/server communication
  • Expand tests at component boundaries
  • End-to-end async transactions: each transaction runs as a task that owns its state in locals, awaits sub-transaction futures, and streams updates to WS subscribers directly as network messages arrive

Non-goals

  • No wire-format changes. NetMessage variants stay the same.
  • No contract/WASM API changes.
  • No changes to the transport layer or handshake handler.
  • No changes to the SessionActor semantics for 1→N client fanout; only the producer side changes.

2. Current architecture (baseline)

All line numbers verified against main at the time of writing.

Concern File Lines
OpManager.opsDashMaps for connect/put/get/subscribe/update, plus completed and under_progress DashSets crates/core/src/node/op_state_manager.rs 204–213
OpManager struct crates/core/src/node/op_state_manager.rs 217–281
SubOperationTracker crates/core/src/node/op_state_manager.rs 73–202
all_sub_operations_completed crates/core/src/node/op_state_manager.rs 161–176
Operation trait crates/core/src/operations.rs 32–56
OperationResult enum (Completed, ContinueOp, SendAndContinue, SendAndComplete) crates/core/src/operations.rs 69–99
handle_op_result re-entry loop (push-before-send at line 238) crates/core/src/operations.rs 131–287
PrioritySelectStream with op_execution receiver crates/core/src/node/network_bridge/priority_select.rs 27, 55–93
p2p_protoc::handle_op_execution inserts into pending_op_results crates/core/src/node/network_bridge/p2p_protoc.rs 3861–3879
handle_pure_network_message_v1 entry crates/core/src/node.rs 775
handle_pure_network_message_v1 PUT branch (callback send at 871) crates/core/src/node.rs 849–906
handle_pure_network_message_v1 GET branch (callback send at 920) crates/core/src/node.rs 907–990
handle_pure_network_message_v1 SUBSCRIBE path (no pending_op_result callback) crates/core/src/node.rs 992–1027
start_subscription_request_internal (sub-op launcher) crates/core/src/operations.rs 745–811
request_subscribe (routing + op construction + push) crates/core/src/operations/subscribe.rs 303–487
ResultRouter (network → SessionActor routing) crates/core/src/client_events/result_router.rs
SessionActor (1→N client fanout, TTL pending cache) crates/core/src/client_events/session_actor.rs

GET request flow (current)

sequenceDiagram
    participant C as WS Client
    participant SA as SessionActor
    participant OM as OpManager<br/>(DashMaps)
    participant EL as Event loop /<br/>priority_select
    participant PP as p2p_protoc
    participant N as Network
    participant RR as ResultRouter

    C->>SA: GET(tx)
    SA->>OM: push(tx, GetOp::init)
    OM-->>SA: ok
    SA-->>C: accepted
    Note over OM: GetOp lives in<br/>DashMap<Tx, GetOp>

    N-->>PP: NetMessage(tx)
    PP->>EL: dispatch
    EL->>OM: pop(tx) -> GetOp
    OM-->>EL: GetOp
    EL->>EL: process_message -><br/>OperationResult
    alt SendAndContinue
        EL->>OM: push(tx, GetOp') (push-before-send)
        EL->>N: NetMessage
    else Completed
        EL->>OM: completed(tx)
        EL->>RR: HostResult(tx, value)
        RR->>SA: DeliverHostResponse
        SA->>C: GET result
    end
Loading

3. Target architecture

Key idea

Each transaction runs as a dedicated async task whose locals hold the op state. The task drives itself by awaiting on a small per-tx inbox and on child-transaction futures. The OpManager shrinks to a bookkeeping table mapping Transaction → JoinHandle (plus the existing completed / under_progress de-dup sets). The op-specific DashMaps go away.

A new primitive — OpCtx::send_and_await(msg) — lets an operation fire a NetMessage and await the reply that arrives keyed by the same Transaction. Sub-transactions become child futures: a parent op calls ctx.spawn_sub(child_op).await and gets back the child's result directly, with no side-band tracker.

Phase 2a minimum surface (what ships first)

Phase 2a introduces OpCtx as scaffolding with exactly one method — send_and_await — built directly on top of Phase 1's forward_pending_op_result_if_completed round-trip. No ops are migrated in 2a; its only caller is a unit test shim. The wider API (sub-op spawning, WS fanout, per-tx inbox, a new Operation trait) is deferred until a concrete consumer needs it.

/// Per-transaction execution context for the task-per-tx model.
/// Phase 2a ships only `send_and_await`; the rest of the target API
/// below is deferred.
pub struct OpCtx {
    tx: Transaction,
    op_execution_sender: mpsc::Sender<(mpsc::Sender<NetMessage>, NetMessage)>,
}

impl OpCtx {
    /// Fire `msg` and await the terminal reply keyed by this ctx's tx.
    ///
    /// Single-use per `Transaction`: `forward_pending_op_result_if_completed`
    /// fires exactly once because the `completed` / `under_progress` dedup
    /// sets suppress subsequent dispatches. Callers that need to retry a
    /// multi-attempt protocol must use a fresh `Transaction` per attempt.
    ///
    /// Must be called from an op task, not from the event loop or a
    /// `priority_select` arm — the inner `.send().await` on
    /// `op_execution_sender` is allowed only because the caller is
    /// short-lived spawned task (see `.claude/rules/channel-safety.md`).
    pub async fn send_and_await(&mut self, msg: NetMessage) -> Result<NetMessage, OpError>;

    pub fn tx(&self) -> Transaction;
}

OpManager grows a thin op_ctx(&self, tx: Transaction) -> OpCtx factory that clones the op_execution_sender out of its inner EventLoopNotificationsSender. The old OpManager::notify_op_execution helper is deleted — its body becomes OpCtx::send_and_await. This keeps the primitive in exactly one place.

Target architecture (eventual vision, for reference)

Phases 2b, 2.5, and 3–6 will grow OpCtx toward the full shape below. These fields and methods are not in Phase 2a; they are added as each phase's concrete consumer arrives, so the scaffolding stays minimal until it has a real caller.

// Eventual shape — NOT Phase 2a. Deferred until later phases need them.
#[async_trait]
pub trait Operation: Sized + Send + 'static {
    type Output: Send;
    async fn run(self, ctx: OpCtx) -> Result<Self::Output, OpError>;
}

pub struct OpCtx {
    tx: Transaction,
    op_execution_sender: /* … */,                  // Phase 2a
    inbox: mpsc::Receiver<NetMessage>,              // deferred: non-terminal msgs
    outbound: mpsc::Sender<NetMessage>,             // deferred: direct bridge send
    updates: broadcast::Sender<HostResponse>,       // deferred: WS fanout
    op_registry: Arc<OpRegistry>,                   // Phase 5
}

impl OpCtx {
    pub async fn send_and_await(&mut self, msg: NetMessage) -> Result<NetMessage, OpError>; // Phase 2a
    pub async fn spawn_sub<O: Operation>(&self, child: O) -> Result<O::Output, OpError>;     // Phase 2.5
    pub fn notify(&self, update: HostResponse);                                               // Phase 3
}

Contrast diagram

flowchart LR
    subgraph Current
        A1[op_manager.push] --> B1[DashMap&lt;Tx,OpEnum&gt;]
        B1 --> C1[handle_op_result<br/>re-entry loop]
        C1 -->|Completed| D1[ResultRouter]
        C1 -->|ContinueOp| B1
        C1 -->|SendAndContinue| B1
        E1[SubOperationTracker<br/>DashMaps] -.polls.-> C1
    end

    subgraph Target
        A2[spawn tx task] --> B2[task locals<br/>own OpState]
        B2 --> C2[loop:<br/>ctx.send_and_await<br/>ctx.spawn_sub]
        C2 -->|return| D2[Ok/Err -> notify]
        C2 -->|child future| C2
    end
Loading

What goes away

  • OpManager.ops.{connect,put,get,subscribe,update} — the five DashMap<Transaction, *Op> fields. Replaced by a single DashMap<Transaction, JoinHandle<Result<(), OpError>>> (or equivalent) used only for cancellation and cleanup.
  • SubOperationTracker entirely. Parent/child relationships become parent-awaits-child futures.
  • handle_op_result and OperationResult. The re-entry loop's state transitions become ordinary Rust control flow inside each op's run.
  • The Operation::process_message seam. Ops express themselves as async fn run.

What stays

  • Transaction IDs and the wire format.
  • completed / under_progress de-dup sets.
  • pending_op_results: HashMap<Transaction, Sender<NetMessage>> inside p2p_protoc — this is already the "send message, await reply keyed by tx" primitive we need. The tx task registers its inbox sender here via notify_op_execution, then awaits.
  • SessionActor and its TTL pending cache. The producer side changes (tx task pushes to broadcast::Sender), but the 1→N fanout and the session lifecycle stay.
  • ResultRouter in a reduced role: it still exists as the bridge between "network layer produced a final result" and "SessionActor delivers it", but in the target design the tx task itself publishes via OpCtx::notify and ResultRouter handles only legacy / non-per-tx paths during the transition.

4. Existing scaffolding to reuse

Some of the infrastructure already exists in the tree but is currently #[allow(dead_code)], marked FIXME: enable async sub-transactions. Important: the reach of this scaffolding is narrower than the original draft of this design suggested. See Scaffolding reach below.

Primitive File Lines Status
OpManager::notify_op_execution — creates (Sender<NetMessage>, Receiver<NetMessage>), sends the pair to op_execution_sender, awaits the callback reply crates/core/src/node/op_state_manager.rs 701–717 dead, FIXME: enable async sub-transactions
EventLoopNotificationsSender::op_execution_sender: Sender<(Sender<NetMessage>, NetMessage)> + getter crates/core/src/node/network_bridge.rs 203, 218–221 dead, FIXME: enable async sub-transactions
PrioritySelectStream.op_execution receiver wiring + SelectResult::OpExecution variant crates/core/src/node/network_bridge/priority_select.rs 27, 63–64 live (drains into handle_op_execution)
p2p_protoc::handle_op_execution — inserts callback into state.pending_op_results and forwards the message into the op pipeline as ConnEvent::InboundMessage crates/core/src/node/network_bridge/p2p_protoc.rs 3861–3879 live
PUT reply forwarding via pending_op_result callback on completion crates/core/src/node.rs 868–878 (send at 871) live (PUT only)
GET reply forwarding via pending_op_result callback on completion crates/core/src/node.rs 917–927 (send at 920) live (GET only)
SUBSCRIBE, CONNECT, UPDATE callback forwarding on completion crates/core/src/node.rs missing

Scaffolding reach

The end-to-end plumbing for "send a message tagged with a Transaction and await a single reply keyed by the same Transaction" is currently live only for PUT and GET. Specifically:

  • handle_op_execution (p2p_protoc.rs:3861-3879) inserts a callback sender into state.pending_op_results keyed by tx, then forwards the wrapped message into the main dispatch path as a ConnEvent::InboundMessage. This is wired end to end.
  • Inside the op pipeline, handle_pure_network_message_v1 (node.rs:775 onwards) calls handle_op_request::<OpT, _>(...) and then — only for PUT and GET — pulls the callback sender out of pending_op_result and calls op_execution_callback.send(completed_msg).await (node.rs:871, 920).
  • The SUBSCRIBE, CONNECT, and UPDATE branches of handle_pure_network_message_v1 do not have this callback block. If a caller today were to invoke notify_op_execution(subscribe_msg).await, the sender would be inserted into pending_op_results and the forwarded message would drive the SUBSCRIBE op, but notify_op_execution would then hang forever on response_receiver.recv() because no code path fires the reply.

This means the scaffolding is not "almost all wired up" for all ops. It is wired for PUT and GET because those ops have a wire-level terminal message that maps cleanly onto a single-reply model. For the other ops, the callback hook is missing and must be added before any of them can participate in the notify_op_execution round-trip.

The SUBSCRIBE complication

Even once the SUBSCRIBE callback hook exists, SUBSCRIBE initiation is not "construct a NetMessage and send it". subscribe::request_subscribe (operations/subscribe.rs:303-487) does all of the following before any wire message exists:

  1. Detects "standalone node with contract available locally" and routes to complete_local_subscription without any network round-trip.
  2. Runs k_closest_potentially_hosting against the ring, then falls back to any connected peer if that returns empty.
  3. Builds the AwaitingResponse routing state (next_hop, tried_peers, visited, alternatives, speculative_paths) that future retries consult.
  4. Constructs the SubscribeMsg::Request { id, instance_id, htl, visited, is_renewal } wire message only after the target peer is chosen.
  5. Calls op_manager.notify_op_change(msg, OpEnum::Subscribe(op)) — a push-before-send primitive distinct from notify_op_execution.

None of this fits the notify_op_execution(msg).await shape, because:

  • The local-completion branch has no network round-trip at all → nothing to reply with.
  • The routing state needs to exist in the op's DashMap<Tx, SubscribeOp> entry before the wire message is sent, because retry/alt-peer logic reads it. notify_op_execution's current contract is "send a NetMessage, get a reply"; it has no op-construction hook.
  • Retries and alternative-peer selection live inside the re-entry loop, not inside the caller of notify_op_execution.

So a "first slice" that ports the client-initiated SUBSCRIBE path onto notify_op_execution is genuinely Phase-2-shaped work: it drags in task-per-tx ownership of the routing state, a prepare_initial_request helper that knows the local-completion fallback, and either a synthetic reply for local completion or a change to the notify_op_execution contract to allow "no network round-trip, completed immediately".

5. Phased rollout (revised)

Each phase should be a separate sub-issue / PR. Phase boundaries are chosen so that after each phase the tree compiles, CI passes, and the existing integration tests stay green.

Phase Scope Entry tests Exit criteria
0 This design (the issue body). Also: remove the S-needs-design label from #1454 once Rev 2 is published. n/a Design published here; #1454 unblocked.
1 Extend the pending_op_result callback hook in handle_pure_network_message_v1 to fire for SUBSCRIBE, CONNECT, and UPDATE completions. This matches the existing PUT/GET pattern at node.rs:869-877 / 918-926. The #[allow(dead_code)] attributes on notify_op_execution and op_execution_sender stay for now — this phase does not add a caller. Adds a test that fires a round-trip through the scaffolding for each op (via a test-only caller or #[cfg(test)] shim). existing integration suite All five op kinds (CONNECT/PUT/GET/SUBSCRIBE/UPDATE) can terminate a notify_op_execution round-trip without hanging. No behavioral change to any op.
2a Introduce OpCtx as a new module with no production callers. Define the OpCtx struct, the send_and_await primitive (built on the Phase 1 notify_op_execution round-trip), and the tx → JoinHandle bookkeeping that Phase 5 will eventually replace the op-specific DashMaps with. No ops migrated. The #[allow(dead_code)] attributes on notify_op_execution and op_execution_sender from Phase 1 are lifted here — Phase 2a becomes the first production caller, but only from a dormant test shim, not from any real op path. Pattern mirrors Phase 1's "widen the primitive without adding a live consumer". new unit tests on OpCtx::send_and_await round-trip via each of the five op kinds OpCtx exists, tests exercise the round-trip end-to-end, nothing in operations/ changes.
2b Migrate SUBSCRIBE's client-initiated path only to OpCtx. Scope excludes renewals (live on legacy path because their jitter/spam-prevention in ring.rs:944-1051 is load-bearing) and PUT sub-ops (live on legacy path until Phase 2.5). Factor prepare_initial_request out of request_subscribe to handle the local-completion branch explicitly. SubscribeOp DashMap entry disappears for client-initiated subscribes; renewal and sub-op paths still populate it. Route via a per-tx switch (not a global feature flag). existing subscribe unit + integration tests, new task-per-tx tests for the client path Client-initiated SUBSCRIBE runs via OpCtx; renewal and sub-op SUBSCRIBE unchanged. No regressions in either path.
2c Migrate CONNECT to OpCtx. Deferred from Rev 2's Phase 2 because CONNECT is 5,025 lines (2.3× SUBSCRIBE) with a genuinely different shape: no client-facing origin (joiner is the local node's connection_maintenance loop, not a WS client), no sub-op semantics, hole-punch decoupled from op lifetime (op Completed ≠ connection established), and most of the file's bulk is in RelayContext / ConnectForwardEstimator / visited-bloom-filter machinery rather than the state machine itself. Schedule TBD — not a strict blocker for Phases 3–6, which can proceed on the op kinds they target. CONNECT unit + integration tests Internal CONNECT transactions run via OpCtx.
2.5 Folded into Phase 3 as of Rev 4. Original scope: port the PUT-completion → SUBSCRIBE sub-op hand-off onto task-per-tx. Reason for fold: the scaffolding (DriverRole::SubOp, a registration helper bypassing NodeEvent::LocalSubscribeComplete, extra delivery plumbing through op_manager.completed(child_tx)) becomes redundant as soon as Phase 3's task-per-tx PUT/GET parent can await the child future directly. See Errata (Rev 3 → Rev 4).
3a Migrate PUT (client-initiated originator path) + PUT's blocking-subscribe child to task-per-tx. Collapses the duplicate-send pattern in put.rs:575–594 (early result_router_tx.try_send) + :841–917 (response-leg second send) into a single client-result delivery from the driver task. The blocking-subscribe child (when blocking_subscribe=true) is awaited inline by the PUT task — PUT stops calling expect_and_register_sub_operation. The async non-blocking child path migrates with it. SubOperationTracker stays alive (GET still calls it in 3b). PUT relays stay on the legacy re-entry loop. Speculative paths stay GC-controlled on the DashMap. Install try_forward_task_per_tx_reply for PUT Response / ResponseStreaming variants at the TOP of the PUT branch in handle_pure_network_message_v1; filter ForwardingAck. Synthesize a PutMsg::Response where PUT currently completes locally without a network reply. test_put_with_blocking_subscribe (tests/operations.rs:1971), test_put_contract_three_hop_returns_response (:2551), test_put_then_immediate_subscribe_succeeds_locally_regression_2326 (:3154), test_put_triggers_update_for_subscribers (:3407), test_put_merge_persists_state (:468), test_put_with_subscribe_flag (:1504), all test_put_retry_* unit tests (put.rs:2672+), existing PUT unit tests (put.rs:2102–2920) Client-initiated PUT runs via OpCtx; PUT originator stops touching OpManager.ops.put; blocking PUT parent awaits child subscribe future directly (no SubOperationTracker call from PUT path); client result delivered exactly once. Relay, speculative, and GET paths unchanged.
3b Migrate GET (client-initiated originator path) + GET's blocking-subscribe child to task-per-tx. Reuses the patterns validated in 3a. Adds streaming assembly handling: send_and_await internally blocks on stream_handle.assemble().await for the ResponseStreaming reply variant, preserving the single-reply shape of OpCtx. Four local-completion paths to handle (originator has contract with no peers, relay has contract, originator receives network Response, originator auto-subscribe) — only the first needs synthesis; the network Response path reuses the arriving message directly. Same bypass + filter install as 3a for GET. GET relays stay on the legacy loop. test_get_with_blocking_subscribe (tests/operations.rs:2071), test_get_with_subscribe_flag (:1244), test_get_notfound_no_forwarding_targets (:3285), test_delegate_contract_get (:4009), test_repeated_get_with_unchanged_state_succeeds (:4380), streaming integration tests, GET unit tests (get.rs:3824–4980) including test_blocking_subscribe_propagation and test_retry_and_fallback Client-initiated GET runs via OpCtx; GET originator stops touching OpManager.ops.get; blocking GET parent awaits child subscribe future directly; streaming assembly flows through send_and_await.
3c Delete SubOperationTracker and its machinery. Mechanical deletion PR once 3a and 3b have removed both callers. Delete: the struct (op_state_manager.rs:74–86), sub_operations / root_ops_awaiting_sub_ops / parent_of / expected_sub_operations / failed_parents fields, expect_and_register_sub_operation, mark_sub_op_completed, remove_child_link, cleanup_parent_tracking, get_parent, is_sub_operation, all_sub_operations_completed, count_pending_sub_operations, sub_operation_failed, and the finalized-parent-parking branch in operations.rs:177–208. Delete the 10 sub_operation_tracker_* unit tests (op_state_manager.rs:2462–3090). No Phase 5 dependency — this work slots in immediately after 3b. existing PUT/GET regression tests (no new tests — deletion is mechanical) SubOperationTracker and root_ops_awaiting_sub_ops gone; OpManager.ops.put and .get DashMaps remain but are no longer touched by originator paths (their deletion is still Phase 5).
4 Migrate UPDATE broadcast — the trickiest variant because it fans out to many peers. Verify broadcast semantics match the pre-refactor behaviour (see existing regression test at f28e8e6). UPDATE integration + broadcast regression tests UPDATE live phase stops touching OpManager.ops.
5 Delete OpManager.ops.{connect,put,get,subscribe,update} DashMaps. (SubOperationTracker was already deleted in Phase 3c.) OpManager keeps completed + under_progress + a new tx → JoinHandle table for cancellation. Delete Operation::process_message / OperationResult / handle_op_result. full suite incl. DST replay Ops struct reduced to the bookkeeping fields.
6 Boundary tests: component-level tests for OpCtx, send_and_await, spawn_sub. Documentation sweep: update .claude/rules/operations.md (the push-before-send invariant is now structurally enforced and the rule wording should reflect that). Update docs/architecture/operations/README.md. new boundary tests Rules / docs aligned with new code.

Phase 1 — concrete first slice (revised)

This is the smallest self-contained PR. Unlike Rev 1's Phase 1, this one does not touch any op's routing logic. It only extends an existing pattern to three more op variants.

  1. In crates/core/src/node.rs, add a pending_op_result callback block after handle_op_request::<subscribe::SubscribeOp, _> (around line 999), mirroring the PUT pattern at node.rs:868-878. The "completed message" is the response SubscribeMsg::ReturnSub (or equivalent terminal variant — verify in subscribe.rs).
  2. Repeat for CONNECT (NetMessageV1::Connect) and UPDATE (NetMessageV1::Update) branches of handle_pure_network_message_v1.
  3. Decide how to handle SUBSCRIBE's complete_local_subscription branch: either (a) fire a synthetic "locally completed" reply on the callback so notify_op_execution returns, or (b) declare that notify_op_execution may also observe completion via the op's push-to-completed-set transition and add a fallback wakeup. Option (a) is simpler; option (b) is more general. Recommend (a) for Phase 1.
  4. Add a test-only caller of notify_op_execution (e.g., a #[cfg(test)] module) that fires a round-trip through each of the five op kinds and asserts the call returns. Keep the #[allow(dead_code)] on the production getters — Phase 2 will activate them for real.
  5. Run:
    cargo fmt && cargo clippy -- -D warnings
    cargo test -p freenet
    cargo test -p freenet --features "simulation_tests,testing" --test simulation_integration
    

No production caller is added; no behavior changes for any existing code path. Phase 1 is purely "widen the callback hook so the round-trip primitive can terminate for all op kinds."

6. Risks and mitigations

  • Push-before-send invariant (.claude/rules/operations.md, currently enforced in operations.rs:238): preserved by definition in the target design — op state never leaves the owning task, so the concept of "forgetting to re-push" goes away. The rule text should be updated in Phase 6 to describe the new invariant (state lives in locals, send_and_await is atomic).

  • Bounded-channel rules (.claude/rules/channel-safety.md): the notify_op_execution / send_and_await primitive already uses a bounded Sender<(Sender<NetMessage>, NetMessage)>. notify_op_execution calls .send().await (op_state_manager.rs:710), which violates the rule "never use .send().await in event loops" — however, the caller of notify_op_execution is not an event loop (it's an op task), so this is currently acceptable. Phase 2 must re-audit once there are production callers. The callback-reply side (node.rs:871, 920) also uses .send().await on the bounded pending_op_result channel (capacity 1), which blocks the pure-network-message handler; this is acceptable because the caller is pending_op_results[tx] consumed by the awaiting op task, but must be re-audited once every op kind uses it.

  • Determinism (DST): task spawning order has to be stable under the deterministic runtime. tokio::spawn is not deterministic by itself; use deterministic_select! and the TimeSource / GlobalRng abstractions from crates/core/src/simulation/. Verify via the existing test_deterministic_replay_* tests at every phase. If task spawn order proves to be a blocker, Phase 2 may need to run ops cooperatively on a single-threaded runtime rather than via tokio::spawn.

  • Backpressure: the per-tx inbox mpsc and the per-tx broadcast for WS updates both need explicit bounds. Drop-oldest vs fail-fast policy must be chosen and documented. For WS updates, dropping is likely acceptable (clients will reconcile); for the inbox, dropping is not (it would wedge the transaction).

  • Cancellation: the tx → JoinHandle table gives us clean abort() on timeout, which today is handled implicitly by under_progress expiry. The timeout enforcement logic needs to move from the GC path to each tx task's own timeout arm.

  • Interaction with GC / cleanup exemptions (see AGENTS.md on time-bounded exemptions): the new tx → JoinHandle table must not create unbounded GC blind spots. Any exemption must have a TTL or an absolute age threshold. Phase 5 should include a test that asserts orphaned tx tasks are torn down within a bounded interval.

  • Rollout safety: phases 2–4 leave the tree in a mixed state (some ops via task-per-tx, others via the old re-entry loop). Both paths must coexist cleanly. The OpRegistry introduced in Phase 2 needs to be additive, not a replacement, until Phase 5.

7. Out of scope / future work

  • WS API surface redesign (separate design issue).
  • Error-handling surface redesign across the WS client/server boundary. This design assumes OpError stays roughly as-is; a follow-up can reshape it once the task-per-tx model is in place.
  • Transport resiliency work that depends on this refactor (mentioned in the prioritizer comment) can begin once Phase 3 is merged.

Errata (Rev 1 → Rev 2)

The initial revision of this design document (published 2026-04-08, earlier today) contained two material errors that were caught during Phase 1 implementation planning and are corrected in Rev 2:

  1. Phase 1 was mis-scoped. Rev 1 claimed Phase 1 could be "activate notify_op_execution and convert start_subscription_request_internal to await notify_op_execution(...)" and described this as a drop-in change. In fact:

    • The pending_op_result callback that makes notify_op_execution return is only wired for PUT and GET in handle_pure_network_message_v1 (node.rs:871, 920). SUBSCRIBE, CONNECT, and UPDATE branches have no such hook. A SUBSCRIBE-targeted notify_op_execution call would hang forever on response_receiver.recv().
    • start_subscription_request_internal does not just send a wire message. It spawns a task that calls subscribe::request_subscribe, which runs ring routing, builds pre-populated AwaitingResponse state, has a local-completion fallback path that never touches the network, and uses notify_op_change (not notify_op_execution). Porting this path onto notify_op_execution requires pulling in task-per-tx ownership of the routing state — i.e., it is Phase-2-shaped work, not Phase-1-shaped work.

    Rev 2 reassigns the original Phase 1 description to a new Phase 2.5 (the PUT-completion → SUBSCRIBE sub-op hand-off, done properly after task-per-tx exists) and redefines Phase 1 as the genuine structural prerequisite: extending the callback hook to cover all five op variants.

  2. start_subscription_request_internal line numbers were wrong. Rev 1 cited operations.rs:699-811. Verified against main, the function is at operations.rs:745-811; 699-742 covers the public wrappers (start_subscription_request_async, start_subscription_request_blocking, start_subscription_request).

Rev 2 also adds a "Scaffolding reach" subsection to Section 4 making the PUT/GET-only reality of the callback hook explicit, and clarifies in Section 2 that SUBSCRIBE / CONNECT / UPDATE branches of handle_pure_network_message_v1 lack the callback block.

Errata (Rev 2 → Rev 3)

After Phase 1 shipped (PR #3802), a pre-implementation investigation of SUBSCRIBE's surface area surfaced three material issues with Rev 2's Phase 2 scope:

  1. Phase 2 bundled two unrelated op migrations. SUBSCRIBE is 2,225 lines; CONNECT is 5,025 lines (2.3× larger). Rev 2's parenthetical "(simplest ops)" is accurate relative to GET/PUT/UPDATE but not relative to each other. The two ops also have fundamentally different shapes: SUBSCRIBE has a client-facing origin, blocking sub-op semantics with PUT, and inline retry loops that serialize state through notify_op_change(); CONNECT has no client-facing origin at all (the "joiner" is the local node's connection_maintenance loop, not a WS client), no sub-op semantics, decouples hole-punch from op lifetime (op Completed does not mean the connection is established), and most of its 5,025 lines are in peer-selection machinery (RelayContext, ConnectForwardEstimator, visited-bloom-filter) rather than in the state machine itself. Rev 3 splits these into separate phases: 2b for SUBSCRIBE, 2c for CONNECT. Phase 2c's schedule is TBD and does not block Phases 3–6. (An earlier draft of this Errata claimed CONNECT ops have "hour-long connection attempts"; the subsequent CONNECT investigation [posted as an issue comment] refuted that — CONNECT ops time out at OPERATION_TTL = 60s; it's the resulting peer connections that live for hours, managed by the transport layer independently of the op.)

  2. SUBSCRIBE has three distinct entry points with different failure semantics, not one:

    • Client-initiated (from a WS Subscribe request) — errors surface to the WS client.
    • Renewal-initiated (ring.rs:944-1051) — fire-and-forget, protected by can_request_subscription() spam prevention and jitter scheduling.
    • Sub-op of PUT (operations.rs:745-811, called from put.rs:1589-1608, blocking mode) — parent PUT blocks on subscribe completion via expect_and_register_sub_operation / all_sub_operations_completed.

    Rev 2's Phase 2 implied all three would migrate together, but doing so would require the new task-per-tx SUBSCRIBE to cooperate with the legacy re-entry-loop PUT via the SubOperationTracker machinery that Phase 2.5 is supposed to start retiring — circular. Rev 3 scopes Phase 2b to the client-initiated path only. Renewals stay on the legacy path (their jitter/spam-prevention timing is load-bearing and decoupling it is a separate rabbit hole); PUT sub-ops stay on the legacy path until Phase 2.5 ports them properly.

  3. SUBSCRIBE's retry logic is not localized. Rev 2's description "routing logic moves into the spawned task's locals" handwaves a significant rewrite. Today every retry path (NotFound → next alternative, alternatives exhausted → fresh k_closest round, peer timeout → abort-and-replace) reconstructs the full SubscribeOp and re-inserts it via notify_op_change(). Moving to task locals means rewriting all three retry paths, not just the happy path. This work belongs in Phase 2b, but Rev 3 introduces Phase 2a as a strict prerequisite: land OpCtx itself (struct definition, send_and_await primitive on top of Phase 1's round-trip, tx → JoinHandle bookkeeping) as dormant scaffolding with no production callers, analogous to how Phase 1 landed the callback hook without activating notify_op_execution. Phase 2b then activates OpCtx for exactly one entry point (client-initiated SUBSCRIBE) and reuses 2a's primitives rather than inventing them mid-migration.

Rev 3 also lifts the #[allow(dead_code)] attributes on notify_op_execution and op_execution_sender in Phase 2a (via the test shim that exercises OpCtx::send_and_await), satisfying the TODO left in Phase 1's doc comments.

Pre-implementation investigation reports for each op surface will be posted as comments on this issue rather than folded into the design body, so the design stays the plan-of-record and the investigations stay the research trail.

Errata (Rev 3 → Rev 4)

After Phase 2b shipped (PR #3806, 2026-04-10), the Phase 2.5 implementation was drafted and verified against HEAD. Rev 3's Phase 2.5 description suggested "retiring SubOperationTracker usage for subscribe children" and "awaiting the child future in the parent task" — the latter is incompatible with the current PUT/GET parent, which is not a task but a re-entry-loop DashMap entry. Two corrections followed from that verification:

  1. SubOperationTracker is not a polling mechanism; it is event-driven. PUT/GET stash their terminal state in root_ops_awaiting_sub_ops (operations.rs:205); op_state_manager.rs:926-973 finalizes the parent when the last child calls op_manager.completed(child_tx). Replacing this with a oneshot while the parent still lives in a DashMap entry is a lateral move — it introduces new delivery plumbing (a DriverRole::SubOp variant of Phase 2b's drive_subscribe_inner, a register_local_subscription helper so the sub-op path skips the NodeEvent::LocalSubscribeComplete publish that the client path needs) without actually enabling the parent to await anything.

  2. start_subscription_request_internal has legitimate async-path users. track_parent=false (non-blocking PUT/GET with subscribe flag) does not touch expect_and_register_sub_operation at all — it just spawns a best-effort subscribe. That path also must stay on the legacy re-entry loop until its parent migrates.

The honest structural win — the parent task owning its child future directly — only exists once PUT/GET themselves are task-per-tx, which is Phase 3. Doing Phase 2.5 first means landing DriverRole::SubOp + the registration helper + extra completed(child_tx) / sub_operation_failed wiring as scaffolding that gets immediately removed when Phase 3 rewrites the parent side.

Rev 4 therefore folds Phase 2.5 into Phase 3. The blocking-subscription sub-op child (track_parent=true) and the async non-blocking child (track_parent=false) both migrate as part of Phase 3's PUT/GET task-per-tx work, which is when the parent task can hold the child future as a local and await it without any of the SubOperationTracker round-trip. The existing blocking-subscribe regression tests (tests/operations.rs:1971 and :2071) continue to guard the legacy path until Phase 3 lands.

Facts verified during the Rev 4 investigation that Phase 3 should reuse as inputs (the exploration itself is not preserved as a separate document — Phase 3 will re-verify against HEAD per the Phase N rule):

  • The call-site inventory for every subscribe entry point (public wrappers, start_subscription_after_put, auto_subscribe_on_get_response, the renewal path in ring.rs:1046, and the Phase 2b SUBSCRIBE bypass in node.rs:1161) distinguishes which branches a Phase 3 PUT/GET migration must touch.
  • The complete_local_subscriptionNodeEvent::LocalSubscribeComplete publish is client-path-only; any sub-op pathway that calls into the local-completion helper must skip that NodeEvent because the parent PUT/GET owns the client reply.
  • The parent_tx / child_tx / per-attempt attempt_tx bookkeeping from Phase 2b is already safe for sub-op use: expect_and_register_sub_operation is called only with child_tx, and the per-retry attempt_tx never enters parent_of.
  • The three Phase 2b bugs (bypass gap in forward_pending_op_result_if_completed, ForwardingAck / non-terminal message filter, local-completion reply synthesis) must be re-checked for every op path Phase 3 adds.

Errata (Rev 4 → Rev 5)

After Rev 4 folded Phase 2.5 into Phase 3, a pre-implementation investigation of GET, PUT, and the subscribe sub-op hand-off was run against HEAD (posted as an issue comment on 2026-04-11). Three findings motivated a further split:

  1. PUT and GET are independently large. PUT's originator path has a duplicate-send pattern (put.rs:575–594 early client result via result_router_tx.try_send + :841–917 second send on the response leg, relying on client-side SessionActor dedup) that is the single biggest structural change in Phase 3 and is orthogonal to anything GET does. GET adds streaming assembly inside send_and_await and four local-completion paths to handle. Bundling them in one PR makes debugging either change harder. Rev 5 splits them: 3a is PUT, 3b is GET.

  2. SubOperationTracker can only be deleted after BOTH callers are removed. Grep of HEAD shows exactly one non-test caller of expect_and_register_sub_operationoperations.rs:755 inside start_subscription_request_internal, which is invoked from PUT's start_subscription_after_put and GET's auto_subscribe_on_get_response. 3a can stop PUT from calling the tracker, but the tracker itself must stay alive until GET also stops calling it in 3b. The tracker deletion becomes a mechanical third PR — 3c — which slots in immediately after 3b rather than waiting until Phase 5. Phase 5's scope is now just the OpManager.ops DashMap deletion (the tracker is gone by then).

  3. The attempt_tx terminology from Rev 3/4 notes does not exist on HEAD. Grep finds zero occurrences. The actual bookkeeping distinguishes only parent_tx and child_tx; there is no per-retry intermediate transaction kind. The Rev 4 bullet that claimed "per-attempt attempt_tx never enters parent_of" was verifying something that isn't there. The bullet is preserved above as historical context but should not influence Phase 3 design — drop the term.

Additional facts verified during the Rev 5 investigation that 3a/3b/3c should reuse:

  • PUT's duplicate-send is intentional and load-bearing until the task-per-tx driver replaces it. The early send at put.rs:575–594 is what makes PUT feel fast from a client's perspective (local storage → immediate response), and the second send at :841–917 is what guarantees delivery when the originator is not the final hop. The task-per-tx driver must collapse these to a single delivery by routing the synthesized local-completion PutMsg::Response through the pending_op_result callback instead of result_router_tx.try_send.
  • PUT relay ops with state: None are normal, not an error state. finalized() at put.rs:91–93 correctly treats None as done. A 3a driver must not assume state: Some(_) for every PUT in flight.
  • PUT speculative paths stay GC-controlled on the legacy DashMap path. They are spawned from the garbage cleanup task (put.rs:51–54, :251–264), not from the originator driver. 3a must not replicate speculative spawning in the task-per-tx driver.
  • GET fetch_contract is always true post-fix: node should always cache contract WASM on GET, regardless of return_contract_code #3757. The client's return_contract_code flag only controls the client-facing payload, not the wire request. 3b must preserve that distinction.
  • GET streaming assembly blocks the originator's state transitionstream_handle.assemble().await at get.rs:2892 is the boundary between AwaitingResponse and Finished. 3b's send_and_await should block on assembly internally for the ResponseStreaming variant.
  • Relay stale-cache deferral logic (get.rs:1406–1431): relays with only a stale LRU entry defer to the network and keep the stale copy as fallback. 3b must preserve this — relays stay on the legacy path, but any helper shared between originator and relay must not break the interest check.
  • complete_local_subscription's NodeEvent::LocalSubscribeComplete publish stays. It is still needed for the renewal and intermediate-peer paths. 3a/3b's inline-await pattern bypasses the event in the blocking-subscribe case (the parent task's own completion is the notification), but the helper itself is not deleted.

Design written for #1454. Line numbers verified against main at the time of writing; re-verify before implementing each phase.

Metadata

Metadata

Assignees

Labels

A-developer-xpArea: developer experienceE-hardExperience needed to fix/implement: Hard / a lotT-trackingType: Meta-issue tracking multiple related issues

Type

No type

Projects

Status

In Progress

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions