P-Diff Chunking Fixes + Link Deserialization Fix#674
Conversation
📝 WalkthroughWalkthroughAdds end-to-end chunked-diff handling: chunk verification at commit time, load-before-update behavior on pull, eager chunk resolution in workspace, aggregated integrity checks for chunk dependencies, a new extern to fetch diff references, expanded sweettests for large (600-link) commits, and minor config/deserialization tweaks. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Commit as Commit Module
participant ChunkStore as Chunk Storage
participant Integrity as Integrity Validator
participant Pull as Pull Module
participant Workspace as Workspace
Client->>Commit: Submit large diff (>CHUNK_THRESHOLD)
Commit->>ChunkStore: Split & store chunks
Commit->>ChunkStore: Verify each chunk retrievable (retry loop)
Commit->>Integrity: Validate parents + chunk dependencies
Integrity-->>Commit: Validation result
Commit->>ChunkStore: Create parent PerspectiveDiffEntryReference
Commit-->>Client: Commit complete
Client->>Pull: Broadcast / Pull diff
Pull->>ChunkStore: Load diff entry
alt Diff is chunked
Pull->>ChunkStore: Load & aggregate chunks
end
Pull->>Pull: Update current_revision (only if load succeeded)
Pull-->>Client: Broadcast handled
Client->>Workspace: Render workspace
Workspace->>ChunkStore: Detect chunked entry
alt Entry is chunked
Workspace->>ChunkStore: Eagerly resolve full diff
Workspace->>Workspace: Insert resolved non-chunked reference
else
Workspace->>Workspace: Insert original reference
end
Workspace-->>Client: Rendered additions/removals
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs (2)
550-562: Always-true predicate inretry_until_success— intentional but worth a brief comment.The predicate
|_result| true(line 557-561) meansretry_until_successwill accept any response, including one with 0 additions. The inline comment explains this, but the explanation is easy to miss. A shorter inline note directly at the closure would help:|_result: &perspective_diff_sync_integrity::PullResult| { - // For initial pull, the function returns early with empty diff - // but updates current_revision, so just check it doesn't error - true + true // Initial pull returns empty diff; real assertion is on render() below },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs` around lines 550 - 562, The predicate passed to retry_until_success for Bob's pull currently returns true unconditionally (closure |_: &perspective_diff_sync_integrity::PullResult| true), which is intentional but easy to miss; update the closure to include a concise inline comment like "// accept any non-error result since initial pull only updates current_revision" or expand the existing comment directly above the closure to explicitly state this rationale, referencing retry_until_success, the closure, and PullResult so future readers understand why we accept an empty diff.
175-181: Duplicate retry pattern — consider extracting a helper.This 5-iteration
exchange_peer_info+ sleep loop is identical to lines 72-75. A small helper likeensure_peer_connectivity(&conductors).awaitwould reduce duplication and make intent clearer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs` around lines 175 - 181, The retry loop calling conductors.exchange_peer_info().await with a sleep is duplicated; extract it into a small async helper like ensure_peer_connectivity(conductors: &Conductors).await (or ensure_peer_connectivity(&conductors).await) that runs the 5-iteration loop and sleeps 2s between attempts, place the helper in the test module (or a shared test utils area) and replace both occurrences of the explicit for-loop with a single call to ensure_peer_connectivity(&conductors).await to remove duplication and clarify intent.bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/lib.rs (1)
210-216: Test-only helper is exposed in production builds.
update_current_revision(line 91) is gated behind#[cfg(feature = "test")], but this newget_diff_entry_referenceis unconditionally compiled as an#[hdk_extern]. If it's truly only needed for testing, consider gating it similarly to avoid expanding the production zome API surface.Proposed fix
/// Helper function for testing - get a PerspectiveDiffEntryReference by hash +#[cfg(feature = "test")] #[hdk_extern] pub fn get_diff_entry_reference(hash: Hash) -> ExternResult<PerspectiveDiffEntryReference> {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/lib.rs` around lines 210 - 216, The helper function get_diff_entry_reference is exposed unconditionally as an #[hdk_extern] but is only needed for tests; gate it like update_current_revision by adding #[cfg(feature = "test")] above the function (and/or use #[cfg_attr(feature = "test", hdk_extern)] if you need the attribute only when the feature is enabled) so PerspectiveDiffEntryReference and retriever::HolochainRetreiver::get remain test-only and the production zome API surface isn't expanded.bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/workspace.rs (1)
783-801:squashed_diffstill used bybuild_diffspath — verify chunks are loadable.In the
build_diffs→collect_until_common_ancestor→sort_graphflow,entry_mapgets populated fromself.diffs(line 294) which contains raw/unresolved entries.squashed_diffthen callsload_diff_from_entry::<Retriever>on each, so chunks are resolved at read time here. This is consistent but different from the eager resolution inhandle_parents.Consider whether the
build_diffspath should also eagerly resolve (for consistency), or whether you intentionally keep the two paths different. Currently it works — just worth documenting the asymmetry.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/workspace.rs` around lines 783 - 801, squashed_diff iterates entry_map and resolves chunks lazily via load_diff_from_entry, which is asymmetric with handle_parents' eager resolution; update the build_diffs → collect_until_common_ancestor → sort_graph flow so that when populating self.entry_map from self.diffs it eagerly resolves entries (call load_diff_from_entry::<Retriever> while building entry_map) to match handle_parents, or alternatively add a clear comment in the build_diffs path and a unit/integration test to assert load_diff_from_entry can load chunked entries at read time; reference functions/fields: squashed_diff, build_diffs, collect_until_common_ancestor, sort_graph, entry_map, self.diffs, load_diff_from_entry, handle_parents.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`:
- Around line 286-441: The test doesn't reliably exercise the chunk-loading
failure path because it starts connected (setup_conductors), uses pull() which
often succeeds, and only conditionally asserts; change it to deterministically
trigger the failure by either starting conductors disconnected (use
setup_conductors(2, false)) and invoking handle_broadcast() directly with a
crafted PerspectiveDiffEntryReference from create_commit_input_multi, or inject
a test-only failure hook into load_diff_from_entry() (via #[cfg(test)] or a test
feature) and keep calling pull() with call_zome_fallible to force an error; also
remove or convert the long println! block explaining the fix into a short doc
comment or test comment to keep the test focused; references: setup_conductors,
create_commit_input_multi, call_zome_fallible, pull, handle_broadcast,
load_diff_from_entry, current_revision.
In
`@bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/commit.rs`:
- Around line 90-94: The retry check in commit.rs uses `if retry_count >
MAX_RETRIES` but since `retry_count` is incremented before this check, that
allows one extra attempt; change the condition to `if retry_count >=
MAX_RETRIES` (in the block guarding chunk verification that returns
`SocialContextError::InternalError("Failed to verify chunk availability after
creation")`) so the loop enforces the intended max retries for `retry_count` vs
`MAX_RETRIES`.
---
Nitpick comments:
In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`:
- Around line 550-562: The predicate passed to retry_until_success for Bob's
pull currently returns true unconditionally (closure |_:
&perspective_diff_sync_integrity::PullResult| true), which is intentional but
easy to miss; update the closure to include a concise inline comment like "//
accept any non-error result since initial pull only updates current_revision" or
expand the existing comment directly above the closure to explicitly state this
rationale, referencing retry_until_success, the closure, and PullResult so
future readers understand why we accept an empty diff.
- Around line 175-181: The retry loop calling
conductors.exchange_peer_info().await with a sleep is duplicated; extract it
into a small async helper like ensure_peer_connectivity(conductors:
&Conductors).await (or ensure_peer_connectivity(&conductors).await) that runs
the 5-iteration loop and sleeps 2s between attempts, place the helper in the
test module (or a shared test utils area) and replace both occurrences of the
explicit for-loop with a single call to
ensure_peer_connectivity(&conductors).await to remove duplication and clarify
intent.
In
`@bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/lib.rs`:
- Around line 210-216: The helper function get_diff_entry_reference is exposed
unconditionally as an #[hdk_extern] but is only needed for tests; gate it like
update_current_revision by adding #[cfg(feature = "test")] above the function
(and/or use #[cfg_attr(feature = "test", hdk_extern)] if you need the attribute
only when the feature is enabled) so PerspectiveDiffEntryReference and
retriever::HolochainRetreiver::get remain test-only and the production zome API
surface isn't expanded.
In
`@bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/workspace.rs`:
- Around line 783-801: squashed_diff iterates entry_map and resolves chunks
lazily via load_diff_from_entry, which is asymmetric with handle_parents' eager
resolution; update the build_diffs → collect_until_common_ancestor → sort_graph
flow so that when populating self.entry_map from self.diffs it eagerly resolves
entries (call load_diff_from_entry::<Retriever> while building entry_map) to
match handle_parents, or alternatively add a clear comment in the build_diffs
path and a unit/integration test to assert load_diff_from_entry can load chunked
entries at read time; reference functions/fields: squashed_diff, build_diffs,
collect_until_common_ancestor, sort_graph, entry_map, self.diffs,
load_diff_from_entry, handle_parents.
bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs
Show resolved
Hide resolved
bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/commit.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
🧹 Nitpick comments (2)
bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs (2)
499-511: Always-true predicate makesretry_until_successa no-op filter.The predicate
|_result| truemeans the retry only guards against call errors, never against an unsatisfactory result. This works for the stated purpose but is somewhat misleading — a reader might expect the predicate to validate something meaningful.Consider using
call_zome_falliblewith a manual retry loop instead, or adding a brief comment next to the predicate explaining the intent more explicitly (e.g.,// We only care that the call succeeds; diff content is verified via render() below).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs` around lines 499 - 511, The predicate passed to retry_until_success (used when obtaining bob_pull) is always true, making the retry not validate the call result; either replace retry_until_success with a manual retry using call_zome_fallible and validate the PullResult, or keep retry_until_success but replace the closure with an explanatory comment next to the predicate (e.g., near bob_pull/retry_until_success) that states we only care the call succeeds and that the actual diff content is asserted later via render(); reference retry_until_success, bob_pull, call_zome_fallible, and render() so the change is easy to locate.
68-75: Redundantexchange_peer_infocall on line 68.
exchange_peer_info()is called once on line 68, then immediately again 5 more times in the retry loop starting at line 72. The first standalone call is unnecessary since the loop covers it.The same pattern appears in
test_merge_fetch_deepat lines 174–181.Suggested fix
- conductors.exchange_peer_info().await; - // Retry connection until agents can see each other // This is needed for the development version of Holochain for _attempt in 0..5 { conductors.exchange_peer_info().await; tokio::time::sleep(tokio::time::Duration::from_millis(2000)).await; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs` around lines 68 - 75, The standalone call to conductors.exchange_peer_info().await is redundant because the subsequent for-loop already performs the same call multiple times; remove the extra call (the one before the retry loop) in both occurrences (the current test and the test_merge_fetch_deep function) so only the retry loop containing conductors.exchange_peer_info().await runs. Ensure you update both locations to avoid duplicate peer-exchange calls and keep the retry loop intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`:
- Around line 499-511: The predicate passed to retry_until_success (used when
obtaining bob_pull) is always true, making the retry not validate the call
result; either replace retry_until_success with a manual retry using
call_zome_fallible and validate the PullResult, or keep retry_until_success but
replace the closure with an explanatory comment next to the predicate (e.g.,
near bob_pull/retry_until_success) that states we only care the call succeeds
and that the actual diff content is asserted later via render(); reference
retry_until_success, bob_pull, call_zome_fallible, and render() so the change is
easy to locate.
- Around line 68-75: The standalone call to
conductors.exchange_peer_info().await is redundant because the subsequent
for-loop already performs the same call multiple times; remove the extra call
(the one before the retry loop) in both occurrences (the current test and the
test_merge_fetch_deep function) so only the retry loop containing
conductors.exchange_peer_info().await runs. Ensure you update both locations to
avoid duplicate peer-exchange calls and keep the retry loop intact.
Summary
P-Diff chunking fixes and link deserialization fix. No debug log commits included.
Commits
P-Diff Fixes
Tests
Critical Fix
Excluded
Created by Data at Nico's request.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores