You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tracking issue for some internal refactoring we need to do. The goal is to:
improve debugging capabilities over the kernel
simplify and clean up some of the core code, cleaner abstractions
improve certain error handling, ws api client/server comms, etc.
add more testing around the boundaries of some components (like we did with handhsake handler, transport, etc.
Changes:
move towards transactions executed end-to-end asynchronously instead of the current model of execution
wait for subtransactions results, communicate to ws client subscribers as network messages are inbound etc. and simplify overall the execution flow and reasoning (we essentially don't need to keep the state of the ops internally anymore in the OpManager)
Design: end-to-end async transactions
This section replaces the previous [TODO: add more detail] placeholder. Line numbers are verified against main at the time of writing. Follow-up PRs implementing each phase should re-verify before making changes.
Revision history:
Rev 1 (2026-04-08): Initial draft. Over-promised the reach of existing scaffolding and mis-scoped Phase 1. See Errata at the end of this document.
Rev 2 (2026-04-08): Corrected Phase 1 after reading request_subscribe and the pure-network message dispatch path. Rev 1's Phase 1 has been split into a real prerequisite (Phase 1) and deferred work (Phase 2.5). Line numbers corrected.
Rev 3 (2026-04-09): After Phase 1 shipped (PR refactor(ops): extend pending_op_result callback to all op variants (#1454 phase 1) #3802), investigated SUBSCRIBE's full surface area before starting Phase 2 implementation. Found that Rev 2's Phase 2 bundles too much work in one slice. Split Phase 2 into 2a (introduce OpCtx as dormant scaffolding — no op migrated), 2b (migrate SUBSCRIBE client-initiated path only), and 2c (migrate CONNECT — deferred, schedule TBD). Phases 2.5 and 3–6 are unchanged. See the Errata (Rev 2 → Rev 3) section at the end for the specific findings that motivated the split.
Rev 4 (2026-04-11): After Phase 2b shipped (PR refactor(ops): migrate client-initiated SUBSCRIBE to OpCtx (#1454 phase 2b) #3806), drafted the Phase 2.5 implementation and verified it against HEAD. Found that the intended structural win is narrow (only the track_parent=true branch of start_subscription_request_internal moves; SubOperationTracker, root_ops_awaiting_sub_ops, and the async non-blocking subscribe path all stay on the legacy re-entry loop) and that the scaffolding it would introduce (a DriverRole::SubOp variant, a register_local_subscription helper to bypass NodeEvent::LocalSubscribeComplete) becomes dead weight the moment Phase 3's task-per-tx PUT/GET parent can await the child future inline. Phase 2.5 is folded into Phase 3. The blocking-subscription sub-op child migrates together with PUT/GET. The async non-blocking subscribe path (track_parent=false) also stays on the legacy loop until Phase 3. See the Errata (Rev 3 → Rev 4) section at the end.
Rev 5 (2026-04-11): Pre-implementation investigation of GET, PUT, and the subscribe sub-op hand-off (posted as issue comment) confirmed that PUT and GET are each large enough to merit their own slice, and that SubOperationTracker can only be deleted after BOTH callers (operations.rs:755 blocking path, invoked from PUT's start_subscription_after_put AND GET's auto_subscribe_on_get_response) are gone. Phase 3 is split into 3a / 3b / 3c: 3a migrates PUT + PUT's blocking-subscribe child (tracker stays alive for GET); 3b migrates GET + GET's blocking-subscribe child + streaming assembly inside send_and_await; 3c is a mechanical deletion PR that retires SubOperationTracker, root_ops_awaiting_sub_ops, parent_of, failed_parents, expected_sub_operations, and their 10 unit tests once both callers are gone. PUT-first is chosen because PUT's state machine is simpler (no streaming on the originator completion path), and its duplicate-send collapse (put.rs:575–594 + :841–917) is the single largest structural change in Phase 3 — validating it in isolation de-risks GET. The attempt_tx terminology referenced in earlier Rev 3/4 notes does not exist on HEAD and has been dropped. See the Errata (Rev 4 → Rev 5) section at the end.
1. Overview
Problem
Today the core runs each transaction through a centralized re-entry loop:
A client request calls op_manager.push(tx, OpEnum) and returns.
A NetMessage for tx arrives at p2p_protoc.
handle_op_result pops the OpEnum out of an OpManagerDashMap, calls Operation::process_message, and gets back an OperationResult.
Depending on the variant (Completed / ContinueOp / SendAndContinue / SendAndComplete), it either marks the op completed, re-inserts it into the same DashMap (push-before-send), or forwards a message and re-inserts.
Sub-transactions are tracked side-band in a separate SubOperationTracker with its own DashMaps (sub_operations, root_ops_awaiting_sub_ops, parent_of, expected_sub_operations, failed_parents). A parent op only proceeds when all_sub_operations_completed returns true.
Client notifications are fan-out post-hoc via a ResultRouter → SessionActor.
Pain points:
Transaction state lives in a global OpManager rather than in the task that logically owns it. Stack traces are useless for reasoning about "what is this transaction doing right now?"
Sub-operation orchestration is a state machine driven by DashMap writes and GC, not Rust's async runtime. There is no await between "parent wants a value from a child" and "child produces it."
Bugs in this area tend to be of the form "the parent op's DashMap entry was cleaned up before the child's callback fired", or "the parent's expected_sub_operations count disagreed with reality." These are not expressible as compile errors.
The client-notification path (ResultRouter → SessionActor) is decoupled from the operation that produced the result, so streaming updates (e.g. incremental GET responses) are harder to reason about than they should be.
Goals (from the tracking issue)
Improve debugging of the kernel / transaction flow
Cleaner operation abstractions (operations own their own state)
Better error handling and WS client/server communication
Expand tests at component boundaries
End-to-end async transactions: each transaction runs as a task that owns its state in locals, awaits sub-transaction futures, and streams updates to WS subscribers directly as network messages arrive
Non-goals
No wire-format changes. NetMessage variants stay the same.
No contract/WASM API changes.
No changes to the transport layer or handshake handler.
No changes to the SessionActor semantics for 1→N client fanout; only the producer side changes.
2. Current architecture (baseline)
All line numbers verified against main at the time of writing.
Concern
File
Lines
OpManager.ops — DashMaps for connect/put/get/subscribe/update, plus completed and under_progressDashSets
sequenceDiagram
participant C as WS Client
participant SA as SessionActor
participant OM as OpManager<br/>(DashMaps)
participant EL as Event loop /<br/>priority_select
participant PP as p2p_protoc
participant N as Network
participant RR as ResultRouter
C->>SA: GET(tx)
SA->>OM: push(tx, GetOp::init)
OM-->>SA: ok
SA-->>C: accepted
Note over OM: GetOp lives in<br/>DashMap<Tx, GetOp>
N-->>PP: NetMessage(tx)
PP->>EL: dispatch
EL->>OM: pop(tx) -> GetOp
OM-->>EL: GetOp
EL->>EL: process_message -><br/>OperationResult
alt SendAndContinue
EL->>OM: push(tx, GetOp') (push-before-send)
EL->>N: NetMessage
else Completed
EL->>OM: completed(tx)
EL->>RR: HostResult(tx, value)
RR->>SA: DeliverHostResponse
SA->>C: GET result
end
Loading
3. Target architecture
Key idea
Each transaction runs as a dedicated async task whose locals hold the op state. The task drives itself by awaiting on a small per-tx inbox and on child-transaction futures. The OpManager shrinks to a bookkeeping table mapping Transaction → JoinHandle (plus the existing completed / under_progress de-dup sets). The op-specific DashMaps go away.
A new primitive — OpCtx::send_and_await(msg) — lets an operation fire a NetMessage and await the reply that arrives keyed by the same Transaction. Sub-transactions become child futures: a parent op calls ctx.spawn_sub(child_op).await and gets back the child's result directly, with no side-band tracker.
Phase 2a minimum surface (what ships first)
Phase 2a introduces OpCtx as scaffolding with exactly one method — send_and_await — built directly on top of Phase 1's forward_pending_op_result_if_completed round-trip. No ops are migrated in 2a; its only caller is a unit test shim. The wider API (sub-op spawning, WS fanout, per-tx inbox, a new Operation trait) is deferred until a concrete consumer needs it.
/// Per-transaction execution context for the task-per-tx model./// Phase 2a ships only `send_and_await`; the rest of the target API/// below is deferred.pubstructOpCtx{tx:Transaction,op_execution_sender: mpsc::Sender<(mpsc::Sender<NetMessage>,NetMessage)>,}implOpCtx{/// Fire `msg` and await the terminal reply keyed by this ctx's tx.////// Single-use per `Transaction`: `forward_pending_op_result_if_completed`/// fires exactly once because the `completed` / `under_progress` dedup/// sets suppress subsequent dispatches. Callers that need to retry a/// multi-attempt protocol must use a fresh `Transaction` per attempt.////// Must be called from an op task, not from the event loop or a/// `priority_select` arm — the inner `.send().await` on/// `op_execution_sender` is allowed only because the caller is/// short-lived spawned task (see `.claude/rules/channel-safety.md`).pubasyncfnsend_and_await(&mutself,msg:NetMessage) -> Result<NetMessage,OpError>;pubfntx(&self) -> Transaction;}
OpManager grows a thin op_ctx(&self, tx: Transaction) -> OpCtx factory that clones the op_execution_sender out of its inner EventLoopNotificationsSender. The old OpManager::notify_op_execution helper is deleted — its body becomes OpCtx::send_and_await. This keeps the primitive in exactly one place.
Target architecture (eventual vision, for reference)
Phases 2b, 2.5, and 3–6 will grow OpCtx toward the full shape below. These fields and methods are not in Phase 2a; they are added as each phase's concrete consumer arrives, so the scaffolding stays minimal until it has a real caller.
// Eventual shape — NOT Phase 2a. Deferred until later phases need them.#[async_trait]pubtraitOperation:Sized + Send + 'static{typeOutput:Send;asyncfnrun(self,ctx:OpCtx) -> Result<Self::Output,OpError>;}pubstructOpCtx{tx:Transaction,op_execution_sender:/* … */,// Phase 2ainbox: mpsc::Receiver<NetMessage>,// deferred: non-terminal msgsoutbound: mpsc::Sender<NetMessage>,// deferred: direct bridge sendupdates: broadcast::Sender<HostResponse>,// deferred: WS fanoutop_registry:Arc<OpRegistry>,// Phase 5}implOpCtx{pubasyncfnsend_and_await(&mutself,msg:NetMessage) -> Result<NetMessage,OpError>;// Phase 2apubasyncfnspawn_sub<O:Operation>(&self,child:O) -> Result<O::Output,OpError>;// Phase 2.5pubfnnotify(&self,update:HostResponse);// Phase 3}
OpManager.ops.{connect,put,get,subscribe,update} — the five DashMap<Transaction, *Op> fields. Replaced by a single DashMap<Transaction, JoinHandle<Result<(), OpError>>> (or equivalent) used only for cancellation and cleanup.
SubOperationTracker entirely. Parent/child relationships become parent-awaits-child futures.
handle_op_result and OperationResult. The re-entry loop's state transitions become ordinary Rust control flow inside each op's run.
The Operation::process_message seam. Ops express themselves as async fn run.
What stays
Transaction IDs and the wire format.
completed / under_progress de-dup sets.
pending_op_results: HashMap<Transaction, Sender<NetMessage>> inside p2p_protoc — this is already the "send message, await reply keyed by tx" primitive we need. The tx task registers its inbox sender here via notify_op_execution, then awaits.
SessionActor and its TTL pending cache. The producer side changes (tx task pushes to broadcast::Sender), but the 1→N fanout and the session lifecycle stay.
ResultRouter in a reduced role: it still exists as the bridge between "network layer produced a final result" and "SessionActor delivers it", but in the target design the tx task itself publishes via OpCtx::notify and ResultRouter handles only legacy / non-per-tx paths during the transition.
4. Existing scaffolding to reuse
Some of the infrastructure already exists in the tree but is currently #[allow(dead_code)], marked FIXME: enable async sub-transactions. Important: the reach of this scaffolding is narrower than the original draft of this design suggested. See Scaffolding reach below.
Primitive
File
Lines
Status
OpManager::notify_op_execution — creates (Sender<NetMessage>, Receiver<NetMessage>), sends the pair to op_execution_sender, awaits the callback reply
p2p_protoc::handle_op_execution — inserts callback into state.pending_op_resultsand forwards the message into the op pipeline as ConnEvent::InboundMessage
crates/core/src/node/network_bridge/p2p_protoc.rs
3861–3879
live
PUT reply forwarding via pending_op_result callback on completion
crates/core/src/node.rs
868–878 (send at 871)
live (PUT only)
GET reply forwarding via pending_op_result callback on completion
crates/core/src/node.rs
917–927 (send at 920)
live (GET only)
SUBSCRIBE, CONNECT, UPDATE callback forwarding on completion
crates/core/src/node.rs
—
missing
Scaffolding reach
The end-to-end plumbing for "send a message tagged with a Transaction and await a single reply keyed by the same Transaction" is currently live only for PUT and GET. Specifically:
handle_op_execution (p2p_protoc.rs:3861-3879) inserts a callback sender into state.pending_op_results keyed by tx, then forwards the wrapped message into the main dispatch path as a ConnEvent::InboundMessage. This is wired end to end.
Inside the op pipeline, handle_pure_network_message_v1 (node.rs:775 onwards) calls handle_op_request::<OpT, _>(...) and then — only for PUT and GET — pulls the callback sender out of pending_op_result and calls op_execution_callback.send(completed_msg).await (node.rs:871, 920).
The SUBSCRIBE, CONNECT, and UPDATE branches of handle_pure_network_message_v1 do not have this callback block. If a caller today were to invoke notify_op_execution(subscribe_msg).await, the sender would be inserted into pending_op_results and the forwarded message would drive the SUBSCRIBE op, but notify_op_execution would then hang forever on response_receiver.recv() because no code path fires the reply.
This means the scaffolding is not "almost all wired up" for all ops. It is wired for PUT and GET because those ops have a wire-level terminal message that maps cleanly onto a single-reply model. For the other ops, the callback hook is missing and must be added before any of them can participate in the notify_op_execution round-trip.
The SUBSCRIBE complication
Even once the SUBSCRIBE callback hook exists, SUBSCRIBE initiation is not "construct a NetMessage and send it". subscribe::request_subscribe (operations/subscribe.rs:303-487) does all of the following before any wire message exists:
Detects "standalone node with contract available locally" and routes to complete_local_subscription without any network round-trip.
Runs k_closest_potentially_hosting against the ring, then falls back to any connected peer if that returns empty.
Builds the AwaitingResponse routing state (next_hop, tried_peers, visited, alternatives, speculative_paths) that future retries consult.
Constructs the SubscribeMsg::Request { id, instance_id, htl, visited, is_renewal } wire message only after the target peer is chosen.
Calls op_manager.notify_op_change(msg, OpEnum::Subscribe(op)) — a push-before-send primitive distinct from notify_op_execution.
None of this fits the notify_op_execution(msg).await shape, because:
The local-completion branch has no network round-trip at all → nothing to reply with.
The routing state needs to exist in the op's DashMap<Tx, SubscribeOp> entry before the wire message is sent, because retry/alt-peer logic reads it. notify_op_execution's current contract is "send a NetMessage, get a reply"; it has no op-construction hook.
Retries and alternative-peer selection live inside the re-entry loop, not inside the caller of notify_op_execution.
So a "first slice" that ports the client-initiated SUBSCRIBE path onto notify_op_execution is genuinely Phase-2-shaped work: it drags in task-per-tx ownership of the routing state, a prepare_initial_request helper that knows the local-completion fallback, and either a synthetic reply for local completion or a change to the notify_op_execution contract to allow "no network round-trip, completed immediately".
5. Phased rollout (revised)
Each phase should be a separate sub-issue / PR. Phase boundaries are chosen so that after each phase the tree compiles, CI passes, and the existing integration tests stay green.
Phase
Scope
Entry tests
Exit criteria
0
This design (the issue body). Also: remove the S-needs-design label from #1454 once Rev 2 is published.
Extend the pending_op_result callback hook in handle_pure_network_message_v1 to fire for SUBSCRIBE, CONNECT, and UPDATE completions. This matches the existing PUT/GET pattern at node.rs:869-877 / 918-926. The #[allow(dead_code)] attributes on notify_op_execution and op_execution_sender stay for now — this phase does not add a caller. Adds a test that fires a round-trip through the scaffolding for each op (via a test-only caller or #[cfg(test)] shim).
existing integration suite
All five op kinds (CONNECT/PUT/GET/SUBSCRIBE/UPDATE) can terminate a notify_op_execution round-trip without hanging. No behavioral change to any op.
2a
Introduce OpCtx as a new module with no production callers. Define the OpCtx struct, the send_and_await primitive (built on the Phase 1 notify_op_execution round-trip), and the tx → JoinHandle bookkeeping that Phase 5 will eventually replace the op-specific DashMaps with. No ops migrated. The #[allow(dead_code)] attributes on notify_op_execution and op_execution_sender from Phase 1 are lifted here — Phase 2a becomes the first production caller, but only from a dormant test shim, not from any real op path. Pattern mirrors Phase 1's "widen the primitive without adding a live consumer".
new unit tests on OpCtx::send_and_await round-trip via each of the five op kinds
OpCtx exists, tests exercise the round-trip end-to-end, nothing in operations/ changes.
2b
Migrate SUBSCRIBE's client-initiated path only to OpCtx. Scope excludes renewals (live on legacy path because their jitter/spam-prevention in ring.rs:944-1051 is load-bearing) and PUT sub-ops (live on legacy path until Phase 2.5). Factor prepare_initial_request out of request_subscribe to handle the local-completion branch explicitly. SubscribeOp DashMap entry disappears for client-initiated subscribes; renewal and sub-op paths still populate it. Route via a per-tx switch (not a global feature flag).
existing subscribe unit + integration tests, new task-per-tx tests for the client path
Client-initiated SUBSCRIBE runs via OpCtx; renewal and sub-op SUBSCRIBE unchanged. No regressions in either path.
2c
Migrate CONNECT to OpCtx. Deferred from Rev 2's Phase 2 because CONNECT is 5,025 lines (2.3× SUBSCRIBE) with a genuinely different shape: no client-facing origin (joiner is the local node's connection_maintenance loop, not a WS client), no sub-op semantics, hole-punch decoupled from op lifetime (op Completed ≠ connection established), and most of the file's bulk is in RelayContext / ConnectForwardEstimator / visited-bloom-filter machinery rather than the state machine itself. Schedule TBD — not a strict blocker for Phases 3–6, which can proceed on the op kinds they target.
CONNECT unit + integration tests
Internal CONNECT transactions run via OpCtx.
2.5
Folded into Phase 3 as of Rev 4. Original scope: port the PUT-completion → SUBSCRIBE sub-op hand-off onto task-per-tx. Reason for fold: the scaffolding (DriverRole::SubOp, a registration helper bypassing NodeEvent::LocalSubscribeComplete, extra delivery plumbing through op_manager.completed(child_tx)) becomes redundant as soon as Phase 3's task-per-tx PUT/GET parent can await the child future directly. See Errata (Rev 3 → Rev 4).
—
—
3a
Migrate PUT (client-initiated originator path) + PUT's blocking-subscribe child to task-per-tx. Collapses the duplicate-send pattern in put.rs:575–594 (early result_router_tx.try_send) + :841–917 (response-leg second send) into a single client-result delivery from the driver task. The blocking-subscribe child (when blocking_subscribe=true) is awaited inline by the PUT task — PUT stops calling expect_and_register_sub_operation. The async non-blocking child path migrates with it. SubOperationTracker stays alive (GET still calls it in 3b). PUT relays stay on the legacy re-entry loop. Speculative paths stay GC-controlled on the DashMap. Install try_forward_task_per_tx_reply for PUT Response / ResponseStreaming variants at the TOP of the PUT branch in handle_pure_network_message_v1; filter ForwardingAck. Synthesize a PutMsg::Response where PUT currently completes locally without a network reply.
test_put_with_blocking_subscribe (tests/operations.rs:1971), test_put_contract_three_hop_returns_response (:2551), test_put_then_immediate_subscribe_succeeds_locally_regression_2326 (:3154), test_put_triggers_update_for_subscribers (:3407), test_put_merge_persists_state (:468), test_put_with_subscribe_flag (:1504), all test_put_retry_* unit tests (put.rs:2672+), existing PUT unit tests (put.rs:2102–2920)
Client-initiated PUT runs via OpCtx; PUT originator stops touching OpManager.ops.put; blocking PUT parent awaits child subscribe future directly (no SubOperationTracker call from PUT path); client result delivered exactly once. Relay, speculative, and GET paths unchanged.
3b
Migrate GET (client-initiated originator path) + GET's blocking-subscribe child to task-per-tx. Reuses the patterns validated in 3a. Adds streaming assembly handling: send_and_await internally blocks on stream_handle.assemble().await for the ResponseStreaming reply variant, preserving the single-reply shape of OpCtx. Four local-completion paths to handle (originator has contract with no peers, relay has contract, originator receives network Response, originator auto-subscribe) — only the first needs synthesis; the network Response path reuses the arriving message directly. Same bypass + filter install as 3a for GET. GET relays stay on the legacy loop.
test_get_with_blocking_subscribe (tests/operations.rs:2071), test_get_with_subscribe_flag (:1244), test_get_notfound_no_forwarding_targets (:3285), test_delegate_contract_get (:4009), test_repeated_get_with_unchanged_state_succeeds (:4380), streaming integration tests, GET unit tests (get.rs:3824–4980) including test_blocking_subscribe_propagation and test_retry_and_fallback
Client-initiated GET runs via OpCtx; GET originator stops touching OpManager.ops.get; blocking GET parent awaits child subscribe future directly; streaming assembly flows through send_and_await.
3c
Delete SubOperationTracker and its machinery. Mechanical deletion PR once 3a and 3b have removed both callers. Delete: the struct (op_state_manager.rs:74–86), sub_operations / root_ops_awaiting_sub_ops / parent_of / expected_sub_operations / failed_parents fields, expect_and_register_sub_operation, mark_sub_op_completed, remove_child_link, cleanup_parent_tracking, get_parent, is_sub_operation, all_sub_operations_completed, count_pending_sub_operations, sub_operation_failed, and the finalized-parent-parking branch in operations.rs:177–208. Delete the 10 sub_operation_tracker_* unit tests (op_state_manager.rs:2462–3090). No Phase 5 dependency — this work slots in immediately after 3b.
existing PUT/GET regression tests (no new tests — deletion is mechanical)
SubOperationTracker and root_ops_awaiting_sub_ops gone; OpManager.ops.put and .get DashMaps remain but are no longer touched by originator paths (their deletion is still Phase 5).
4
Migrate UPDATE broadcast — the trickiest variant because it fans out to many peers. Verify broadcast semantics match the pre-refactor behaviour (see existing regression test at f28e8e6).
UPDATE integration + broadcast regression tests
UPDATE live phase stops touching OpManager.ops.
5
Delete OpManager.ops.{connect,put,get,subscribe,update} DashMaps. (SubOperationTracker was already deleted in Phase 3c.) OpManager keeps completed + under_progress + a new tx → JoinHandle table for cancellation. Delete Operation::process_message / OperationResult / handle_op_result.
full suite incl. DST replay
Ops struct reduced to the bookkeeping fields.
6
Boundary tests: component-level tests for OpCtx, send_and_await, spawn_sub. Documentation sweep: update .claude/rules/operations.md (the push-before-send invariant is now structurally enforced and the rule wording should reflect that). Update docs/architecture/operations/README.md.
new boundary tests
Rules / docs aligned with new code.
Phase 1 — concrete first slice (revised)
This is the smallest self-contained PR. Unlike Rev 1's Phase 1, this one does not touch any op's routing logic. It only extends an existing pattern to three more op variants.
In crates/core/src/node.rs, add a pending_op_result callback block after handle_op_request::<subscribe::SubscribeOp, _> (around line 999), mirroring the PUT pattern at node.rs:868-878. The "completed message" is the response SubscribeMsg::ReturnSub (or equivalent terminal variant — verify in subscribe.rs).
Repeat for CONNECT (NetMessageV1::Connect) and UPDATE (NetMessageV1::Update) branches of handle_pure_network_message_v1.
Decide how to handle SUBSCRIBE's complete_local_subscription branch: either (a) fire a synthetic "locally completed" reply on the callback so notify_op_execution returns, or (b) declare that notify_op_execution may also observe completion via the op's push-to-completed-set transition and add a fallback wakeup. Option (a) is simpler; option (b) is more general. Recommend (a) for Phase 1.
Add a test-only caller of notify_op_execution (e.g., a #[cfg(test)] module) that fires a round-trip through each of the five op kinds and asserts the call returns. Keep the #[allow(dead_code)] on the production getters — Phase 2 will activate them for real.
Run:
cargo fmt && cargo clippy -- -D warnings
cargo test -p freenet
cargo test -p freenet --features "simulation_tests,testing" --test simulation_integration
No production caller is added; no behavior changes for any existing code path. Phase 1 is purely "widen the callback hook so the round-trip primitive can terminate for all op kinds."
6. Risks and mitigations
Push-before-send invariant (.claude/rules/operations.md, currently enforced in operations.rs:238): preserved by definition in the target design — op state never leaves the owning task, so the concept of "forgetting to re-push" goes away. The rule text should be updated in Phase 6 to describe the new invariant (state lives in locals, send_and_await is atomic).
Bounded-channel rules (.claude/rules/channel-safety.md): the notify_op_execution / send_and_await primitive already uses a bounded Sender<(Sender<NetMessage>, NetMessage)>. notify_op_execution calls .send().await (op_state_manager.rs:710), which violates the rule "never use .send().await in event loops" — however, the caller of notify_op_execution is not an event loop (it's an op task), so this is currently acceptable. Phase 2 must re-audit once there are production callers. The callback-reply side (node.rs:871, 920) also uses .send().await on the bounded pending_op_result channel (capacity 1), which blocks the pure-network-message handler; this is acceptable because the caller is pending_op_results[tx] consumed by the awaiting op task, but must be re-audited once every op kind uses it.
Determinism (DST): task spawning order has to be stable under the deterministic runtime. tokio::spawn is not deterministic by itself; use deterministic_select! and the TimeSource / GlobalRng abstractions from crates/core/src/simulation/. Verify via the existing test_deterministic_replay_* tests at every phase. If task spawn order proves to be a blocker, Phase 2 may need to run ops cooperatively on a single-threaded runtime rather than via tokio::spawn.
Backpressure: the per-tx inbox mpsc and the per-tx broadcast for WS updates both need explicit bounds. Drop-oldest vs fail-fast policy must be chosen and documented. For WS updates, dropping is likely acceptable (clients will reconcile); for the inbox, dropping is not (it would wedge the transaction).
Cancellation: the tx → JoinHandle table gives us clean abort() on timeout, which today is handled implicitly by under_progress expiry. The timeout enforcement logic needs to move from the GC path to each tx task's own timeout arm.
Interaction with GC / cleanup exemptions (see AGENTS.md on time-bounded exemptions): the new tx → JoinHandle table must not create unbounded GC blind spots. Any exemption must have a TTL or an absolute age threshold. Phase 5 should include a test that asserts orphaned tx tasks are torn down within a bounded interval.
Rollout safety: phases 2–4 leave the tree in a mixed state (some ops via task-per-tx, others via the old re-entry loop). Both paths must coexist cleanly. The OpRegistry introduced in Phase 2 needs to be additive, not a replacement, until Phase 5.
7. Out of scope / future work
WS API surface redesign (separate design issue).
Error-handling surface redesign across the WS client/server boundary. This design assumes OpError stays roughly as-is; a follow-up can reshape it once the task-per-tx model is in place.
Transport resiliency work that depends on this refactor (mentioned in the prioritizer comment) can begin once Phase 3 is merged.
Errata (Rev 1 → Rev 2)
The initial revision of this design document (published 2026-04-08, earlier today) contained two material errors that were caught during Phase 1 implementation planning and are corrected in Rev 2:
Phase 1 was mis-scoped. Rev 1 claimed Phase 1 could be "activate notify_op_execution and convert start_subscription_request_internal to await notify_op_execution(...)" and described this as a drop-in change. In fact:
The pending_op_result callback that makes notify_op_execution return is only wired for PUT and GET in handle_pure_network_message_v1 (node.rs:871, 920). SUBSCRIBE, CONNECT, and UPDATE branches have no such hook. A SUBSCRIBE-targeted notify_op_execution call would hang forever on response_receiver.recv().
start_subscription_request_internal does not just send a wire message. It spawns a task that calls subscribe::request_subscribe, which runs ring routing, builds pre-populated AwaitingResponse state, has a local-completion fallback path that never touches the network, and uses notify_op_change (not notify_op_execution). Porting this path onto notify_op_execution requires pulling in task-per-tx ownership of the routing state — i.e., it is Phase-2-shaped work, not Phase-1-shaped work.
Rev 2 reassigns the original Phase 1 description to a new Phase 2.5 (the PUT-completion → SUBSCRIBE sub-op hand-off, done properly after task-per-tx exists) and redefines Phase 1 as the genuine structural prerequisite: extending the callback hook to cover all five op variants.
start_subscription_request_internal line numbers were wrong. Rev 1 cited operations.rs:699-811. Verified against main, the function is at operations.rs:745-811; 699-742 covers the public wrappers (start_subscription_request_async, start_subscription_request_blocking, start_subscription_request).
Rev 2 also adds a "Scaffolding reach" subsection to Section 4 making the PUT/GET-only reality of the callback hook explicit, and clarifies in Section 2 that SUBSCRIBE / CONNECT / UPDATE branches of handle_pure_network_message_v1 lack the callback block.
Errata (Rev 2 → Rev 3)
After Phase 1 shipped (PR #3802), a pre-implementation investigation of SUBSCRIBE's surface area surfaced three material issues with Rev 2's Phase 2 scope:
Phase 2 bundled two unrelated op migrations. SUBSCRIBE is 2,225 lines; CONNECT is 5,025 lines (2.3× larger). Rev 2's parenthetical "(simplest ops)" is accurate relative to GET/PUT/UPDATE but not relative to each other. The two ops also have fundamentally different shapes: SUBSCRIBE has a client-facing origin, blocking sub-op semantics with PUT, and inline retry loops that serialize state through notify_op_change(); CONNECT has no client-facing origin at all (the "joiner" is the local node's connection_maintenance loop, not a WS client), no sub-op semantics, decouples hole-punch from op lifetime (op Completed does not mean the connection is established), and most of its 5,025 lines are in peer-selection machinery (RelayContext, ConnectForwardEstimator, visited-bloom-filter) rather than in the state machine itself. Rev 3 splits these into separate phases: 2b for SUBSCRIBE, 2c for CONNECT. Phase 2c's schedule is TBD and does not block Phases 3–6. (An earlier draft of this Errata claimed CONNECT ops have "hour-long connection attempts"; the subsequent CONNECT investigation [posted as an issue comment] refuted that — CONNECT ops time out at OPERATION_TTL = 60s; it's the resulting peer connections that live for hours, managed by the transport layer independently of the op.)
SUBSCRIBE has three distinct entry points with different failure semantics, not one:
Client-initiated (from a WS Subscribe request) — errors surface to the WS client.
Renewal-initiated (ring.rs:944-1051) — fire-and-forget, protected by can_request_subscription() spam prevention and jitter scheduling.
Sub-op of PUT (operations.rs:745-811, called from put.rs:1589-1608, blocking mode) — parent PUT blocks on subscribe completion via expect_and_register_sub_operation / all_sub_operations_completed.
Rev 2's Phase 2 implied all three would migrate together, but doing so would require the new task-per-tx SUBSCRIBE to cooperate with the legacy re-entry-loop PUT via the SubOperationTracker machinery that Phase 2.5 is supposed to start retiring — circular. Rev 3 scopes Phase 2b to the client-initiated path only. Renewals stay on the legacy path (their jitter/spam-prevention timing is load-bearing and decoupling it is a separate rabbit hole); PUT sub-ops stay on the legacy path until Phase 2.5 ports them properly.
SUBSCRIBE's retry logic is not localized. Rev 2's description "routing logic moves into the spawned task's locals" handwaves a significant rewrite. Today every retry path (NotFound → next alternative, alternatives exhausted → fresh k_closest round, peer timeout → abort-and-replace) reconstructs the full SubscribeOp and re-inserts it via notify_op_change(). Moving to task locals means rewriting all three retry paths, not just the happy path. This work belongs in Phase 2b, but Rev 3 introduces Phase 2a as a strict prerequisite: land OpCtx itself (struct definition, send_and_await primitive on top of Phase 1's round-trip, tx → JoinHandle bookkeeping) as dormant scaffolding with no production callers, analogous to how Phase 1 landed the callback hook without activating notify_op_execution. Phase 2b then activates OpCtx for exactly one entry point (client-initiated SUBSCRIBE) and reuses 2a's primitives rather than inventing them mid-migration.
Rev 3 also lifts the #[allow(dead_code)] attributes on notify_op_execution and op_execution_sender in Phase 2a (via the test shim that exercises OpCtx::send_and_await), satisfying the TODO left in Phase 1's doc comments.
Pre-implementation investigation reports for each op surface will be posted as comments on this issue rather than folded into the design body, so the design stays the plan-of-record and the investigations stay the research trail.
Errata (Rev 3 → Rev 4)
After Phase 2b shipped (PR #3806, 2026-04-10), the Phase 2.5 implementation was drafted and verified against HEAD. Rev 3's Phase 2.5 description suggested "retiring SubOperationTracker usage for subscribe children" and "awaiting the child future in the parent task" — the latter is incompatible with the current PUT/GET parent, which is not a task but a re-entry-loop DashMap entry. Two corrections followed from that verification:
SubOperationTracker is not a polling mechanism; it is event-driven. PUT/GET stash their terminal state in root_ops_awaiting_sub_ops (operations.rs:205); op_state_manager.rs:926-973 finalizes the parent when the last child calls op_manager.completed(child_tx). Replacing this with a oneshot while the parent still lives in a DashMap entry is a lateral move — it introduces new delivery plumbing (a DriverRole::SubOp variant of Phase 2b's drive_subscribe_inner, a register_local_subscription helper so the sub-op path skips the NodeEvent::LocalSubscribeComplete publish that the client path needs) without actually enabling the parent to await anything.
start_subscription_request_internal has legitimate async-path users.track_parent=false (non-blocking PUT/GET with subscribe flag) does not touch expect_and_register_sub_operation at all — it just spawns a best-effort subscribe. That path also must stay on the legacy re-entry loop until its parent migrates.
The honest structural win — the parent task owning its child future directly — only exists once PUT/GET themselves are task-per-tx, which is Phase 3. Doing Phase 2.5 first means landing DriverRole::SubOp + the registration helper + extra completed(child_tx) / sub_operation_failed wiring as scaffolding that gets immediately removed when Phase 3 rewrites the parent side.
Rev 4 therefore folds Phase 2.5 into Phase 3. The blocking-subscription sub-op child (track_parent=true) and the async non-blocking child (track_parent=false) both migrate as part of Phase 3's PUT/GET task-per-tx work, which is when the parent task can hold the child future as a local and await it without any of the SubOperationTracker round-trip. The existing blocking-subscribe regression tests (tests/operations.rs:1971 and :2071) continue to guard the legacy path until Phase 3 lands.
Facts verified during the Rev 4 investigation that Phase 3 should reuse as inputs (the exploration itself is not preserved as a separate document — Phase 3 will re-verify against HEAD per the Phase N rule):
The call-site inventory for every subscribe entry point (public wrappers, start_subscription_after_put, auto_subscribe_on_get_response, the renewal path in ring.rs:1046, and the Phase 2b SUBSCRIBE bypass in node.rs:1161) distinguishes which branches a Phase 3 PUT/GET migration must touch.
The complete_local_subscription → NodeEvent::LocalSubscribeComplete publish is client-path-only; any sub-op pathway that calls into the local-completion helper must skip that NodeEvent because the parent PUT/GET owns the client reply.
The parent_tx / child_tx / per-attempt attempt_tx bookkeeping from Phase 2b is already safe for sub-op use: expect_and_register_sub_operation is called only with child_tx, and the per-retry attempt_tx never enters parent_of.
The three Phase 2b bugs (bypass gap in forward_pending_op_result_if_completed, ForwardingAck / non-terminal message filter, local-completion reply synthesis) must be re-checked for every op path Phase 3 adds.
Errata (Rev 4 → Rev 5)
After Rev 4 folded Phase 2.5 into Phase 3, a pre-implementation investigation of GET, PUT, and the subscribe sub-op hand-off was run against HEAD (posted as an issue comment on 2026-04-11). Three findings motivated a further split:
PUT and GET are independently large. PUT's originator path has a duplicate-send pattern (put.rs:575–594 early client result via result_router_tx.try_send + :841–917 second send on the response leg, relying on client-side SessionActor dedup) that is the single biggest structural change in Phase 3 and is orthogonal to anything GET does. GET adds streaming assembly inside send_and_await and four local-completion paths to handle. Bundling them in one PR makes debugging either change harder. Rev 5 splits them: 3a is PUT, 3b is GET.
SubOperationTracker can only be deleted after BOTH callers are removed. Grep of HEAD shows exactly one non-test caller of expect_and_register_sub_operation — operations.rs:755 inside start_subscription_request_internal, which is invoked from PUT's start_subscription_after_putand GET's auto_subscribe_on_get_response. 3a can stop PUT from calling the tracker, but the tracker itself must stay alive until GET also stops calling it in 3b. The tracker deletion becomes a mechanical third PR — 3c — which slots in immediately after 3b rather than waiting until Phase 5. Phase 5's scope is now just the OpManager.ops DashMap deletion (the tracker is gone by then).
The attempt_tx terminology from Rev 3/4 notes does not exist on HEAD. Grep finds zero occurrences. The actual bookkeeping distinguishes only parent_tx and child_tx; there is no per-retry intermediate transaction kind. The Rev 4 bullet that claimed "per-attempt attempt_tx never enters parent_of" was verifying something that isn't there. The bullet is preserved above as historical context but should not influence Phase 3 design — drop the term.
Additional facts verified during the Rev 5 investigation that 3a/3b/3c should reuse:
PUT's duplicate-send is intentional and load-bearing until the task-per-tx driver replaces it. The early send at put.rs:575–594 is what makes PUT feel fast from a client's perspective (local storage → immediate response), and the second send at :841–917 is what guarantees delivery when the originator is not the final hop. The task-per-tx driver must collapse these to a single delivery by routing the synthesized local-completion PutMsg::Response through the pending_op_result callback instead of result_router_tx.try_send.
PUT relay ops with state: None are normal, not an error state.finalized() at put.rs:91–93 correctly treats None as done. A 3a driver must not assume state: Some(_) for every PUT in flight.
PUT speculative paths stay GC-controlled on the legacy DashMap path. They are spawned from the garbage cleanup task (put.rs:51–54, :251–264), not from the originator driver. 3a must not replicate speculative spawning in the task-per-tx driver.
GET streaming assembly blocks the originator's state transition — stream_handle.assemble().await at get.rs:2892 is the boundary between AwaitingResponse and Finished. 3b's send_and_await should block on assembly internally for the ResponseStreaming variant.
Relay stale-cache deferral logic (get.rs:1406–1431): relays with only a stale LRU entry defer to the network and keep the stale copy as fallback. 3b must preserve this — relays stay on the legacy path, but any helper shared between originator and relay must not break the interest check.
complete_local_subscription's NodeEvent::LocalSubscribeComplete publish stays. It is still needed for the renewal and intermediate-peer paths. 3a/3b's inline-await pattern bypasses the event in the blocking-subscribe case (the parent task's own completion is the notification), but the helper itself is not deleted.
Design written for #1454. Line numbers verified against main at the time of writing; re-verify before implementing each phase.
Tracking issue for some internal refactoring we need to do. The goal is to:
Changes:
Design: end-to-end async transactions
1. Overview
Problem
Today the core runs each transaction through a centralized re-entry loop:
op_manager.push(tx, OpEnum)and returns.NetMessagefortxarrives atp2p_protoc.handle_op_resultpops theOpEnumout of anOpManagerDashMap, callsOperation::process_message, and gets back anOperationResult.Completed/ContinueOp/SendAndContinue/SendAndComplete), it either marks the opcompleted, re-inserts it into the sameDashMap(push-before-send), or forwards a message and re-inserts.SubOperationTrackerwith its ownDashMaps (sub_operations,root_ops_awaiting_sub_ops,parent_of,expected_sub_operations,failed_parents). A parent op only proceeds whenall_sub_operations_completedreturns true.ResultRouter→SessionActor.Pain points:
OpManagerrather than in the task that logically owns it. Stack traces are useless for reasoning about "what is this transaction doing right now?"awaitbetween "parent wants a value from a child" and "child produces it."expected_sub_operationscount disagreed with reality." These are not expressible as compile errors.ResultRouter→SessionActor) is decoupled from the operation that produced the result, so streaming updates (e.g. incrementalGETresponses) are harder to reason about than they should be.Goals (from the tracking issue)
Non-goals
NetMessagevariants stay the same.SessionActorsemantics for 1→N client fanout; only the producer side changes.2. Current architecture (baseline)
All line numbers verified against
mainat the time of writing.OpManager.ops—DashMaps forconnect/put/get/subscribe/update, pluscompletedandunder_progressDashSetscrates/core/src/node/op_state_manager.rsOpManagerstructcrates/core/src/node/op_state_manager.rsSubOperationTrackercrates/core/src/node/op_state_manager.rsall_sub_operations_completedcrates/core/src/node/op_state_manager.rsOperationtraitcrates/core/src/operations.rsOperationResultenum (Completed,ContinueOp,SendAndContinue,SendAndComplete)crates/core/src/operations.rshandle_op_resultre-entry loop (push-before-send at line 238)crates/core/src/operations.rsPrioritySelectStreamwithop_executionreceivercrates/core/src/node/network_bridge/priority_select.rsp2p_protoc::handle_op_executioninserts intopending_op_resultscrates/core/src/node/network_bridge/p2p_protoc.rshandle_pure_network_message_v1entrycrates/core/src/node.rshandle_pure_network_message_v1PUT branch (callback send at 871)crates/core/src/node.rshandle_pure_network_message_v1GET branch (callback send at 920)crates/core/src/node.rshandle_pure_network_message_v1SUBSCRIBE path (nopending_op_resultcallback)crates/core/src/node.rsstart_subscription_request_internal(sub-op launcher)crates/core/src/operations.rsrequest_subscribe(routing + op construction + push)crates/core/src/operations/subscribe.rsResultRouter(network →SessionActorrouting)crates/core/src/client_events/result_router.rsSessionActor(1→N client fanout, TTL pending cache)crates/core/src/client_events/session_actor.rsGET request flow (current)
sequenceDiagram participant C as WS Client participant SA as SessionActor participant OM as OpManager<br/>(DashMaps) participant EL as Event loop /<br/>priority_select participant PP as p2p_protoc participant N as Network participant RR as ResultRouter C->>SA: GET(tx) SA->>OM: push(tx, GetOp::init) OM-->>SA: ok SA-->>C: accepted Note over OM: GetOp lives in<br/>DashMap<Tx, GetOp> N-->>PP: NetMessage(tx) PP->>EL: dispatch EL->>OM: pop(tx) -> GetOp OM-->>EL: GetOp EL->>EL: process_message -><br/>OperationResult alt SendAndContinue EL->>OM: push(tx, GetOp') (push-before-send) EL->>N: NetMessage else Completed EL->>OM: completed(tx) EL->>RR: HostResult(tx, value) RR->>SA: DeliverHostResponse SA->>C: GET result end3. Target architecture
Key idea
Each transaction runs as a dedicated async task whose locals hold the op state. The task drives itself by
awaiting on a small per-tx inbox and on child-transaction futures. TheOpManagershrinks to a bookkeeping table mappingTransaction → JoinHandle(plus the existingcompleted/under_progressde-dup sets). The op-specificDashMaps go away.A new primitive —
OpCtx::send_and_await(msg)— lets an operation fire aNetMessageandawaitthe reply that arrives keyed by the sameTransaction. Sub-transactions become child futures: a parent op callsctx.spawn_sub(child_op).awaitand gets back the child's result directly, with no side-band tracker.Phase 2a minimum surface (what ships first)
Phase 2a introduces
OpCtxas scaffolding with exactly one method —send_and_await— built directly on top of Phase 1'sforward_pending_op_result_if_completedround-trip. No ops are migrated in 2a; its only caller is a unit test shim. The wider API (sub-op spawning, WS fanout, per-tx inbox, a newOperationtrait) is deferred until a concrete consumer needs it.OpManagergrows a thinop_ctx(&self, tx: Transaction) -> OpCtxfactory that clones theop_execution_senderout of its innerEventLoopNotificationsSender. The oldOpManager::notify_op_executionhelper is deleted — its body becomesOpCtx::send_and_await. This keeps the primitive in exactly one place.Target architecture (eventual vision, for reference)
Phases 2b, 2.5, and 3–6 will grow
OpCtxtoward the full shape below. These fields and methods are not in Phase 2a; they are added as each phase's concrete consumer arrives, so the scaffolding stays minimal until it has a real caller.Contrast diagram
flowchart LR subgraph Current A1[op_manager.push] --> B1[DashMap<Tx,OpEnum>] B1 --> C1[handle_op_result<br/>re-entry loop] C1 -->|Completed| D1[ResultRouter] C1 -->|ContinueOp| B1 C1 -->|SendAndContinue| B1 E1[SubOperationTracker<br/>DashMaps] -.polls.-> C1 end subgraph Target A2[spawn tx task] --> B2[task locals<br/>own OpState] B2 --> C2[loop:<br/>ctx.send_and_await<br/>ctx.spawn_sub] C2 -->|return| D2[Ok/Err -> notify] C2 -->|child future| C2 endWhat goes away
OpManager.ops.{connect,put,get,subscribe,update}— the fiveDashMap<Transaction, *Op>fields. Replaced by a singleDashMap<Transaction, JoinHandle<Result<(), OpError>>>(or equivalent) used only for cancellation and cleanup.SubOperationTrackerentirely. Parent/child relationships become parent-awaits-child futures.handle_op_resultandOperationResult. The re-entry loop's state transitions become ordinary Rust control flow inside each op'srun.Operation::process_messageseam. Ops express themselves asasync fn run.What stays
TransactionIDs and the wire format.completed/under_progressde-dup sets.pending_op_results: HashMap<Transaction, Sender<NetMessage>>insidep2p_protoc— this is already the "send message, await reply keyed by tx" primitive we need. The tx task registers its inbox sender here vianotify_op_execution, thenawaits.SessionActorand its TTL pending cache. The producer side changes (tx task pushes tobroadcast::Sender), but the 1→N fanout and the session lifecycle stay.ResultRouterin a reduced role: it still exists as the bridge between "network layer produced a final result" and "SessionActor delivers it", but in the target design the tx task itself publishes viaOpCtx::notifyandResultRouterhandles only legacy / non-per-tx paths during the transition.4. Existing scaffolding to reuse
Some of the infrastructure already exists in the tree but is currently
#[allow(dead_code)], markedFIXME: enable async sub-transactions. Important: the reach of this scaffolding is narrower than the original draft of this design suggested. See Scaffolding reach below.OpManager::notify_op_execution— creates(Sender<NetMessage>, Receiver<NetMessage>), sends the pair toop_execution_sender, awaits the callback replycrates/core/src/node/op_state_manager.rsFIXME: enable async sub-transactionsEventLoopNotificationsSender::op_execution_sender: Sender<(Sender<NetMessage>, NetMessage)>+ gettercrates/core/src/node/network_bridge.rsFIXME: enable async sub-transactionsPrioritySelectStream.op_executionreceiver wiring +SelectResult::OpExecutionvariantcrates/core/src/node/network_bridge/priority_select.rshandle_op_execution)p2p_protoc::handle_op_execution— inserts callback intostate.pending_op_resultsand forwards the message into the op pipeline asConnEvent::InboundMessagecrates/core/src/node/network_bridge/p2p_protoc.rspending_op_resultcallback on completioncrates/core/src/node.rspending_op_resultcallback on completioncrates/core/src/node.rscrates/core/src/node.rsScaffolding reach
The end-to-end plumbing for "send a message tagged with a
Transactionand await a single reply keyed by the sameTransaction" is currently live only for PUT and GET. Specifically:handle_op_execution(p2p_protoc.rs:3861-3879) inserts a callback sender intostate.pending_op_resultskeyed by tx, then forwards the wrapped message into the main dispatch path as aConnEvent::InboundMessage. This is wired end to end.handle_pure_network_message_v1(node.rs:775onwards) callshandle_op_request::<OpT, _>(...)and then — only for PUT and GET — pulls the callback sender out ofpending_op_resultand callsop_execution_callback.send(completed_msg).await(node.rs:871, 920).handle_pure_network_message_v1do not have this callback block. If a caller today were to invokenotify_op_execution(subscribe_msg).await, the sender would be inserted intopending_op_resultsand the forwarded message would drive the SUBSCRIBE op, butnotify_op_executionwould then hang forever onresponse_receiver.recv()because no code path fires the reply.This means the scaffolding is not "almost all wired up" for all ops. It is wired for PUT and GET because those ops have a wire-level terminal message that maps cleanly onto a single-reply model. For the other ops, the callback hook is missing and must be added before any of them can participate in the
notify_op_executionround-trip.The SUBSCRIBE complication
Even once the SUBSCRIBE callback hook exists, SUBSCRIBE initiation is not "construct a
NetMessageand send it".subscribe::request_subscribe(operations/subscribe.rs:303-487) does all of the following before any wire message exists:complete_local_subscriptionwithout any network round-trip.k_closest_potentially_hostingagainst the ring, then falls back to any connected peer if that returns empty.AwaitingResponserouting state (next_hop,tried_peers,visited,alternatives,speculative_paths) that future retries consult.SubscribeMsg::Request { id, instance_id, htl, visited, is_renewal }wire message only after the target peer is chosen.op_manager.notify_op_change(msg, OpEnum::Subscribe(op))— a push-before-send primitive distinct fromnotify_op_execution.None of this fits the
notify_op_execution(msg).awaitshape, because:DashMap<Tx, SubscribeOp>entry before the wire message is sent, because retry/alt-peer logic reads it.notify_op_execution's current contract is "send aNetMessage, get a reply"; it has no op-construction hook.notify_op_execution.So a "first slice" that ports the client-initiated SUBSCRIBE path onto
notify_op_executionis genuinely Phase-2-shaped work: it drags in task-per-tx ownership of the routing state, aprepare_initial_requesthelper that knows the local-completion fallback, and either a synthetic reply for local completion or a change to thenotify_op_executioncontract to allow "no network round-trip, completed immediately".5. Phased rollout (revised)
Each phase should be a separate sub-issue / PR. Phase boundaries are chosen so that after each phase the tree compiles, CI passes, and the existing integration tests stay green.
S-needs-designlabel from #1454 once Rev 2 is published.pending_op_resultcallback hook inhandle_pure_network_message_v1to fire for SUBSCRIBE, CONNECT, and UPDATE completions. This matches the existing PUT/GET pattern atnode.rs:869-877/918-926. The#[allow(dead_code)]attributes onnotify_op_executionandop_execution_senderstay for now — this phase does not add a caller. Adds a test that fires a round-trip through the scaffolding for each op (via a test-only caller or#[cfg(test)]shim).notify_op_executionround-trip without hanging. No behavioral change to any op.OpCtxas a new module with no production callers. Define theOpCtxstruct, thesend_and_awaitprimitive (built on the Phase 1notify_op_executionround-trip), and thetx → JoinHandlebookkeeping that Phase 5 will eventually replace the op-specific DashMaps with. No ops migrated. The#[allow(dead_code)]attributes onnotify_op_executionandop_execution_senderfrom Phase 1 are lifted here — Phase 2a becomes the first production caller, but only from a dormant test shim, not from any real op path. Pattern mirrors Phase 1's "widen the primitive without adding a live consumer".OpCtx::send_and_awaitround-trip via each of the five op kindsOpCtxexists, tests exercise the round-trip end-to-end, nothing inoperations/changes.OpCtx. Scope excludes renewals (live on legacy path because their jitter/spam-prevention inring.rs:944-1051is load-bearing) and PUT sub-ops (live on legacy path until Phase 2.5). Factorprepare_initial_requestout ofrequest_subscribeto handle the local-completion branch explicitly.SubscribeOpDashMap entry disappears for client-initiated subscribes; renewal and sub-op paths still populate it. Route via a per-tx switch (not a global feature flag).OpCtx; renewal and sub-op SUBSCRIBE unchanged. No regressions in either path.OpCtx. Deferred from Rev 2's Phase 2 because CONNECT is 5,025 lines (2.3× SUBSCRIBE) with a genuinely different shape: no client-facing origin (joiner is the local node'sconnection_maintenanceloop, not a WS client), no sub-op semantics, hole-punch decoupled from op lifetime (opCompleted≠ connection established), and most of the file's bulk is inRelayContext/ConnectForwardEstimator/ visited-bloom-filter machinery rather than the state machine itself. Schedule TBD — not a strict blocker for Phases 3–6, which can proceed on the op kinds they target.OpCtx.2.5DriverRole::SubOp, a registration helper bypassingNodeEvent::LocalSubscribeComplete, extra delivery plumbing throughop_manager.completed(child_tx)) becomes redundant as soon as Phase 3's task-per-tx PUT/GET parent canawaitthe child future directly. See Errata (Rev 3 → Rev 4).put.rs:575–594(earlyresult_router_tx.try_send) +:841–917(response-leg second send) into a single client-result delivery from the driver task. The blocking-subscribe child (whenblocking_subscribe=true) is awaited inline by the PUT task — PUT stops callingexpect_and_register_sub_operation. The async non-blocking child path migrates with it.SubOperationTrackerstays alive (GET still calls it in 3b). PUT relays stay on the legacy re-entry loop. Speculative paths stay GC-controlled on the DashMap. Installtry_forward_task_per_tx_replyfor PUTResponse/ResponseStreamingvariants at the TOP of the PUT branch inhandle_pure_network_message_v1; filterForwardingAck. Synthesize aPutMsg::Responsewhere PUT currently completes locally without a network reply.test_put_with_blocking_subscribe(tests/operations.rs:1971),test_put_contract_three_hop_returns_response(:2551),test_put_then_immediate_subscribe_succeeds_locally_regression_2326(:3154),test_put_triggers_update_for_subscribers(:3407),test_put_merge_persists_state(:468),test_put_with_subscribe_flag(:1504), alltest_put_retry_*unit tests (put.rs:2672+), existing PUT unit tests (put.rs:2102–2920)OpCtx; PUT originator stops touchingOpManager.ops.put; blocking PUT parent awaits child subscribe future directly (noSubOperationTrackercall from PUT path); client result delivered exactly once. Relay, speculative, and GET paths unchanged.send_and_awaitinternally blocks onstream_handle.assemble().awaitfor theResponseStreamingreply variant, preserving the single-reply shape ofOpCtx. Four local-completion paths to handle (originator has contract with no peers, relay has contract, originator receives network Response, originator auto-subscribe) — only the first needs synthesis; the network Response path reuses the arriving message directly. Same bypass + filter install as 3a for GET. GET relays stay on the legacy loop.test_get_with_blocking_subscribe(tests/operations.rs:2071),test_get_with_subscribe_flag(:1244),test_get_notfound_no_forwarding_targets(:3285),test_delegate_contract_get(:4009),test_repeated_get_with_unchanged_state_succeeds(:4380), streaming integration tests, GET unit tests (get.rs:3824–4980) includingtest_blocking_subscribe_propagationandtest_retry_and_fallbackOpCtx; GET originator stops touchingOpManager.ops.get; blocking GET parent awaits child subscribe future directly; streaming assembly flows throughsend_and_await.SubOperationTrackerand its machinery. Mechanical deletion PR once 3a and 3b have removed both callers. Delete: the struct (op_state_manager.rs:74–86),sub_operations/root_ops_awaiting_sub_ops/parent_of/expected_sub_operations/failed_parentsfields,expect_and_register_sub_operation,mark_sub_op_completed,remove_child_link,cleanup_parent_tracking,get_parent,is_sub_operation,all_sub_operations_completed,count_pending_sub_operations,sub_operation_failed, and the finalized-parent-parking branch inoperations.rs:177–208. Delete the 10sub_operation_tracker_*unit tests (op_state_manager.rs:2462–3090). No Phase 5 dependency — this work slots in immediately after 3b.SubOperationTrackerandroot_ops_awaiting_sub_opsgone;OpManager.ops.putand.getDashMaps remain but are no longer touched by originator paths (their deletion is still Phase 5).OpManager.ops.OpManager.ops.{connect,put,get,subscribe,update}DashMaps. (SubOperationTrackerwas already deleted in Phase 3c.)OpManagerkeepscompleted+under_progress+ a newtx → JoinHandletable for cancellation. DeleteOperation::process_message/OperationResult/handle_op_result.Opsstruct reduced to the bookkeeping fields.OpCtx,send_and_await,spawn_sub. Documentation sweep: update.claude/rules/operations.md(the push-before-send invariant is now structurally enforced and the rule wording should reflect that). Updatedocs/architecture/operations/README.md.Phase 1 — concrete first slice (revised)
This is the smallest self-contained PR. Unlike Rev 1's Phase 1, this one does not touch any op's routing logic. It only extends an existing pattern to three more op variants.
crates/core/src/node.rs, add apending_op_resultcallback block afterhandle_op_request::<subscribe::SubscribeOp, _>(around line 999), mirroring the PUT pattern atnode.rs:868-878. The "completed message" is the responseSubscribeMsg::ReturnSub(or equivalent terminal variant — verify insubscribe.rs).NetMessageV1::Connect) and UPDATE (NetMessageV1::Update) branches ofhandle_pure_network_message_v1.complete_local_subscriptionbranch: either (a) fire a synthetic "locally completed" reply on the callback sonotify_op_executionreturns, or (b) declare thatnotify_op_executionmay also observe completion via the op's push-to-completed-set transition and add a fallback wakeup. Option (a) is simpler; option (b) is more general. Recommend (a) for Phase 1.notify_op_execution(e.g., a#[cfg(test)]module) that fires a round-trip through each of the five op kinds and asserts the call returns. Keep the#[allow(dead_code)]on the production getters — Phase 2 will activate them for real.No production caller is added; no behavior changes for any existing code path. Phase 1 is purely "widen the callback hook so the round-trip primitive can terminate for all op kinds."
6. Risks and mitigations
Push-before-send invariant (
.claude/rules/operations.md, currently enforced inoperations.rs:238): preserved by definition in the target design — op state never leaves the owning task, so the concept of "forgetting to re-push" goes away. The rule text should be updated in Phase 6 to describe the new invariant (state lives in locals,send_and_awaitis atomic).Bounded-channel rules (
.claude/rules/channel-safety.md): thenotify_op_execution/send_and_awaitprimitive already uses a boundedSender<(Sender<NetMessage>, NetMessage)>.notify_op_executioncalls.send().await(op_state_manager.rs:710), which violates the rule "never use.send().awaitin event loops" — however, the caller ofnotify_op_executionis not an event loop (it's an op task), so this is currently acceptable. Phase 2 must re-audit once there are production callers. The callback-reply side (node.rs:871, 920) also uses.send().awaiton the boundedpending_op_resultchannel (capacity 1), which blocks the pure-network-message handler; this is acceptable because the caller ispending_op_results[tx]consumed by the awaiting op task, but must be re-audited once every op kind uses it.Determinism (DST): task spawning order has to be stable under the deterministic runtime.
tokio::spawnis not deterministic by itself; usedeterministic_select!and theTimeSource/GlobalRngabstractions fromcrates/core/src/simulation/. Verify via the existingtest_deterministic_replay_*tests at every phase. If task spawn order proves to be a blocker, Phase 2 may need to run ops cooperatively on a single-threaded runtime rather than viatokio::spawn.Backpressure: the per-tx inbox mpsc and the per-tx broadcast for WS updates both need explicit bounds. Drop-oldest vs fail-fast policy must be chosen and documented. For WS updates, dropping is likely acceptable (clients will reconcile); for the inbox, dropping is not (it would wedge the transaction).
Cancellation: the
tx → JoinHandletable gives us cleanabort()on timeout, which today is handled implicitly byunder_progressexpiry. The timeout enforcement logic needs to move from the GC path to each tx task's own timeout arm.Interaction with GC / cleanup exemptions (see
AGENTS.mdon time-bounded exemptions): the newtx → JoinHandletable must not create unbounded GC blind spots. Any exemption must have a TTL or an absolute age threshold. Phase 5 should include a test that asserts orphaned tx tasks are torn down within a bounded interval.Rollout safety: phases 2–4 leave the tree in a mixed state (some ops via task-per-tx, others via the old re-entry loop). Both paths must coexist cleanly. The
OpRegistryintroduced in Phase 2 needs to be additive, not a replacement, until Phase 5.7. Out of scope / future work
OpErrorstays roughly as-is; a follow-up can reshape it once the task-per-tx model is in place.Errata (Rev 1 → Rev 2)
The initial revision of this design document (published 2026-04-08, earlier today) contained two material errors that were caught during Phase 1 implementation planning and are corrected in Rev 2:
Phase 1 was mis-scoped. Rev 1 claimed Phase 1 could be "activate
notify_op_executionand convertstart_subscription_request_internaltoawait notify_op_execution(...)" and described this as a drop-in change. In fact:pending_op_resultcallback that makesnotify_op_executionreturn is only wired for PUT and GET inhandle_pure_network_message_v1(node.rs:871, 920). SUBSCRIBE, CONNECT, and UPDATE branches have no such hook. A SUBSCRIBE-targetednotify_op_executioncall would hang forever onresponse_receiver.recv().start_subscription_request_internaldoes not just send a wire message. It spawns a task that callssubscribe::request_subscribe, which runs ring routing, builds pre-populatedAwaitingResponsestate, has a local-completion fallback path that never touches the network, and usesnotify_op_change(notnotify_op_execution). Porting this path ontonotify_op_executionrequires pulling in task-per-tx ownership of the routing state — i.e., it is Phase-2-shaped work, not Phase-1-shaped work.Rev 2 reassigns the original Phase 1 description to a new Phase 2.5 (the PUT-completion → SUBSCRIBE sub-op hand-off, done properly after task-per-tx exists) and redefines Phase 1 as the genuine structural prerequisite: extending the callback hook to cover all five op variants.
start_subscription_request_internalline numbers were wrong. Rev 1 citedoperations.rs:699-811. Verified againstmain, the function is atoperations.rs:745-811;699-742covers the public wrappers (start_subscription_request_async,start_subscription_request_blocking,start_subscription_request).Rev 2 also adds a "Scaffolding reach" subsection to Section 4 making the PUT/GET-only reality of the callback hook explicit, and clarifies in Section 2 that SUBSCRIBE / CONNECT / UPDATE branches of
handle_pure_network_message_v1lack the callback block.Errata (Rev 2 → Rev 3)
After Phase 1 shipped (PR #3802), a pre-implementation investigation of SUBSCRIBE's surface area surfaced three material issues with Rev 2's Phase 2 scope:
Phase 2 bundled two unrelated op migrations. SUBSCRIBE is 2,225 lines; CONNECT is 5,025 lines (2.3× larger). Rev 2's parenthetical "(simplest ops)" is accurate relative to GET/PUT/UPDATE but not relative to each other. The two ops also have fundamentally different shapes: SUBSCRIBE has a client-facing origin, blocking sub-op semantics with PUT, and inline retry loops that serialize state through
notify_op_change(); CONNECT has no client-facing origin at all (the "joiner" is the local node'sconnection_maintenanceloop, not a WS client), no sub-op semantics, decouples hole-punch from op lifetime (opCompleteddoes not mean the connection is established), and most of its 5,025 lines are in peer-selection machinery (RelayContext,ConnectForwardEstimator, visited-bloom-filter) rather than in the state machine itself. Rev 3 splits these into separate phases: 2b for SUBSCRIBE, 2c for CONNECT. Phase 2c's schedule is TBD and does not block Phases 3–6. (An earlier draft of this Errata claimed CONNECT ops have "hour-long connection attempts"; the subsequent CONNECT investigation [posted as an issue comment] refuted that — CONNECT ops time out atOPERATION_TTL = 60s; it's the resulting peer connections that live for hours, managed by the transport layer independently of the op.)SUBSCRIBE has three distinct entry points with different failure semantics, not one:
Subscriberequest) — errors surface to the WS client.ring.rs:944-1051) — fire-and-forget, protected bycan_request_subscription()spam prevention and jitter scheduling.operations.rs:745-811, called fromput.rs:1589-1608, blocking mode) — parent PUT blocks on subscribe completion viaexpect_and_register_sub_operation/all_sub_operations_completed.Rev 2's Phase 2 implied all three would migrate together, but doing so would require the new task-per-tx SUBSCRIBE to cooperate with the legacy re-entry-loop PUT via the
SubOperationTrackermachinery that Phase 2.5 is supposed to start retiring — circular. Rev 3 scopes Phase 2b to the client-initiated path only. Renewals stay on the legacy path (their jitter/spam-prevention timing is load-bearing and decoupling it is a separate rabbit hole); PUT sub-ops stay on the legacy path until Phase 2.5 ports them properly.SUBSCRIBE's retry logic is not localized. Rev 2's description "routing logic moves into the spawned task's locals" handwaves a significant rewrite. Today every retry path (NotFound → next alternative, alternatives exhausted → fresh
k_closestround, peer timeout → abort-and-replace) reconstructs the fullSubscribeOpand re-inserts it vianotify_op_change(). Moving to task locals means rewriting all three retry paths, not just the happy path. This work belongs in Phase 2b, but Rev 3 introduces Phase 2a as a strict prerequisite: landOpCtxitself (struct definition,send_and_awaitprimitive on top of Phase 1's round-trip,tx → JoinHandlebookkeeping) as dormant scaffolding with no production callers, analogous to how Phase 1 landed the callback hook without activatingnotify_op_execution. Phase 2b then activatesOpCtxfor exactly one entry point (client-initiated SUBSCRIBE) and reuses 2a's primitives rather than inventing them mid-migration.Rev 3 also lifts the
#[allow(dead_code)]attributes onnotify_op_executionandop_execution_senderin Phase 2a (via the test shim that exercisesOpCtx::send_and_await), satisfying the TODO left in Phase 1's doc comments.Pre-implementation investigation reports for each op surface will be posted as comments on this issue rather than folded into the design body, so the design stays the plan-of-record and the investigations stay the research trail.
Errata (Rev 3 → Rev 4)
After Phase 2b shipped (PR #3806, 2026-04-10), the Phase 2.5 implementation was drafted and verified against HEAD. Rev 3's Phase 2.5 description suggested "retiring
SubOperationTrackerusage for subscribe children" and "awaiting the child future in the parent task" — the latter is incompatible with the current PUT/GET parent, which is not a task but a re-entry-loop DashMap entry. Two corrections followed from that verification:SubOperationTrackeris not a polling mechanism; it is event-driven. PUT/GET stash their terminal state inroot_ops_awaiting_sub_ops(operations.rs:205);op_state_manager.rs:926-973finalizes the parent when the last child callsop_manager.completed(child_tx). Replacing this with a oneshot while the parent still lives in a DashMap entry is a lateral move — it introduces new delivery plumbing (aDriverRole::SubOpvariant of Phase 2b'sdrive_subscribe_inner, aregister_local_subscriptionhelper so the sub-op path skips theNodeEvent::LocalSubscribeCompletepublish that the client path needs) without actually enabling the parent toawaitanything.start_subscription_request_internalhas legitimate async-path users.track_parent=false(non-blocking PUT/GET with subscribe flag) does not touchexpect_and_register_sub_operationat all — it just spawns a best-effort subscribe. That path also must stay on the legacy re-entry loop until its parent migrates.The honest structural win — the parent task owning its child future directly — only exists once PUT/GET themselves are task-per-tx, which is Phase 3. Doing Phase 2.5 first means landing
DriverRole::SubOp+ the registration helper + extracompleted(child_tx)/sub_operation_failedwiring as scaffolding that gets immediately removed when Phase 3 rewrites the parent side.Rev 4 therefore folds Phase 2.5 into Phase 3. The blocking-subscription sub-op child (
track_parent=true) and the async non-blocking child (track_parent=false) both migrate as part of Phase 3's PUT/GET task-per-tx work, which is when the parent task can hold the child future as a local and await it without any of theSubOperationTrackerround-trip. The existing blocking-subscribe regression tests (tests/operations.rs:1971and:2071) continue to guard the legacy path until Phase 3 lands.Facts verified during the Rev 4 investigation that Phase 3 should reuse as inputs (the exploration itself is not preserved as a separate document — Phase 3 will re-verify against HEAD per the Phase N rule):
start_subscription_after_put,auto_subscribe_on_get_response, the renewal path inring.rs:1046, and the Phase 2b SUBSCRIBE bypass innode.rs:1161) distinguishes which branches a Phase 3 PUT/GET migration must touch.complete_local_subscription→NodeEvent::LocalSubscribeCompletepublish is client-path-only; any sub-op pathway that calls into the local-completion helper must skip that NodeEvent because the parent PUT/GET owns the client reply.parent_tx/child_tx/ per-attemptattempt_txbookkeeping from Phase 2b is already safe for sub-op use:expect_and_register_sub_operationis called only withchild_tx, and the per-retryattempt_txnever entersparent_of.forward_pending_op_result_if_completed, ForwardingAck / non-terminal message filter, local-completion reply synthesis) must be re-checked for every op path Phase 3 adds.Errata (Rev 4 → Rev 5)
After Rev 4 folded Phase 2.5 into Phase 3, a pre-implementation investigation of GET, PUT, and the subscribe sub-op hand-off was run against HEAD (posted as an issue comment on 2026-04-11). Three findings motivated a further split:
PUT and GET are independently large. PUT's originator path has a duplicate-send pattern (
put.rs:575–594early client result viaresult_router_tx.try_send+:841–917second send on the response leg, relying on client-sideSessionActordedup) that is the single biggest structural change in Phase 3 and is orthogonal to anything GET does. GET adds streaming assembly insidesend_and_awaitand four local-completion paths to handle. Bundling them in one PR makes debugging either change harder. Rev 5 splits them: 3a is PUT, 3b is GET.SubOperationTrackercan only be deleted after BOTH callers are removed. Grep of HEAD shows exactly one non-test caller ofexpect_and_register_sub_operation—operations.rs:755insidestart_subscription_request_internal, which is invoked from PUT'sstart_subscription_after_putand GET'sauto_subscribe_on_get_response. 3a can stop PUT from calling the tracker, but the tracker itself must stay alive until GET also stops calling it in 3b. The tracker deletion becomes a mechanical third PR — 3c — which slots in immediately after 3b rather than waiting until Phase 5. Phase 5's scope is now just theOpManager.opsDashMap deletion (the tracker is gone by then).The
attempt_txterminology from Rev 3/4 notes does not exist on HEAD. Grep finds zero occurrences. The actual bookkeeping distinguishes onlyparent_txandchild_tx; there is no per-retry intermediate transaction kind. The Rev 4 bullet that claimed "per-attemptattempt_txnever entersparent_of" was verifying something that isn't there. The bullet is preserved above as historical context but should not influence Phase 3 design — drop the term.Additional facts verified during the Rev 5 investigation that 3a/3b/3c should reuse:
put.rs:575–594is what makes PUT feel fast from a client's perspective (local storage → immediate response), and the second send at:841–917is what guarantees delivery when the originator is not the final hop. The task-per-tx driver must collapse these to a single delivery by routing the synthesized local-completionPutMsg::Responsethrough thepending_op_resultcallback instead ofresult_router_tx.try_send.state: Noneare normal, not an error state.finalized()atput.rs:91–93correctly treatsNoneas done. A 3a driver must not assumestate: Some(_)for every PUT in flight.put.rs:51–54,:251–264), not from the originator driver. 3a must not replicate speculative spawning in the task-per-tx driver.fetch_contractis alwaystruepost-fix: node should always cache contract WASM on GET, regardless of return_contract_code #3757. The client'sreturn_contract_codeflag only controls the client-facing payload, not the wire request. 3b must preserve that distinction.stream_handle.assemble().awaitatget.rs:2892is the boundary betweenAwaitingResponseandFinished. 3b'ssend_and_awaitshould block on assembly internally for theResponseStreamingvariant.get.rs:1406–1431): relays with only a stale LRU entry defer to the network and keep the stale copy as fallback. 3b must preserve this — relays stay on the legacy path, but any helper shared between originator and relay must not break the interest check.complete_local_subscription'sNodeEvent::LocalSubscribeCompletepublish stays. It is still needed for the renewal and intermediate-peer paths. 3a/3b's inline-await pattern bypasses the event in the blocking-subscribe case (the parent task's own completion is the notification), but the helper itself is not deleted.Design written for #1454. Line numbers verified against
mainat the time of writing; re-verify before implementing each phase.