Skip to content

P-Diff Chunking Fixes + Link Deserialization Fix#674

Merged
lucksus merged 12 commits intodevfrom
pdiff-link-fixes
Feb 17, 2026
Merged

P-Diff Chunking Fixes + Link Deserialization Fix#674
lucksus merged 12 commits intodevfrom
pdiff-link-fixes

Conversation

@data-coasys
Copy link
Contributor

@data-coasys data-coasys commented Feb 17, 2026

Summary

P-Diff chunking fixes and link deserialization fix. No debug log commits included.

Commits

P-Diff Fixes

  • p-diff chunks as dependencies
  • Fix fast-forward-through-signal case with chunked diffs
  • Retry loop in ChunkedDiffs::from_entries()
  • another p-diff fix for chunked diffs
  • Remove loop from ChunkedDiff::from_entries()

Tests

  • Test for chunked broadcast
  • Another integration test for chunked diffs

Critical Fix

  • fix: link deserialisation - null as empty string

Excluded

  • All debug logging commits
  • Bootstrap seed commits

Created by Data at Nico's request.

Summary by CodeRabbit

  • New Features

    • Added a public API to retrieve diff entry references.
  • Bug Fixes

    • Only update current revision after a diff successfully loads.
    • Improved deserialization to treat null/missing link fields as empty strings.
    • Eagerly validate and resolve chunked diffs; fail-fast on chunk load failures.
  • Tests

    • Expanded coverage for large/chunked diffs, retries, connectivity, and detailed debug tracing.
  • Chores

    • Updated known language seed configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

Adds end-to-end chunked-diff handling: chunk verification at commit time, load-before-update behavior on pull, eager chunk resolution in workspace, aggregated integrity checks for chunk dependencies, a new extern to fetch diff references, expanded sweettests for large (600-link) commits, and minor config/deserialization tweaks.

Changes

Cohort / File(s) Summary
Test Suite Expansion
bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs
Adds multiple chunked-diff tests (broadcast failure, render returns chunked diffs, initial pull with chunked diffs, large-diff commit), adds retry loops for peer visibility, verbose logging, and removes an unreliable disconnect assertion.
Public API Addition
bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/lib.rs
New #[hdk_extern] get_diff_entry_reference(Hash) -> ExternResult<PerspectiveDiffEntryReference> and re-exports PerspectiveDiffEntryReference.
Chunk Loading Observability
.../link_adapter/chunked_diffs.rs
Adds detailed per-chunk and aggregate logging; fail-fast on chunk retrieval errors; logs CHUNKED vs INLINE handling and aggregation results.
Commit Chunk Verification
.../link_adapter/commit.rs
After creating chunk entries, verifies each chunk is retrievable with a retry loop (up to 10 retries, 100ms) before creating parent reference; stores cloned chunk-hash vector and logs progress.
Pull Operation Reordering
.../link_adapter/pull.rs
In handle_broadcast, loads/aggregates diff first and only updates current_revision on successful load (prevents revision update if diff load fails).
Workspace Chunked Diff Resolution
.../link_adapter/workspace.rs
Introduces generic Retriever usage and ? propagation; eagerly resolves chunked diffs via load_diff_from_entry and inserts resolved non-chunked references into entry_map before graph construction.
Integrity Validation Extension
..._integrity/src/lib.rs
Aggregates missing dependencies from both parents and diff_chunks into a shared collector and returns UnresolvedDependencies only after checking both parents and chunk deps.
Config Updates
cli/mainnet_seed.json, rust-executor/src/mainnet_seed.json
Replaced a knownLinkLanguages hash value in both seed files.
Deserialization Enhancement
rust-executor/src/types.rs
Adds null_as_empty_string deserializer and applies it to Link.source and Link.target to treat null/missing as empty string.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Commit as Commit Module
    participant ChunkStore as Chunk Storage
    participant Integrity as Integrity Validator
    participant Pull as Pull Module
    participant Workspace as Workspace

    Client->>Commit: Submit large diff (>CHUNK_THRESHOLD)
    Commit->>ChunkStore: Split & store chunks
    Commit->>ChunkStore: Verify each chunk retrievable (retry loop)
    Commit->>Integrity: Validate parents + chunk dependencies
    Integrity-->>Commit: Validation result
    Commit->>ChunkStore: Create parent PerspectiveDiffEntryReference
    Commit-->>Client: Commit complete

    Client->>Pull: Broadcast / Pull diff
    Pull->>ChunkStore: Load diff entry
    alt Diff is chunked
        Pull->>ChunkStore: Load & aggregate chunks
    end
    Pull->>Pull: Update current_revision (only if load succeeded)
    Pull-->>Client: Broadcast handled

    Client->>Workspace: Render workspace
    Workspace->>ChunkStore: Detect chunked entry
    alt Entry is chunked
        Workspace->>ChunkStore: Eagerly resolve full diff
        Workspace->>Workspace: Insert resolved non-chunked reference
    else
        Workspace->>Workspace: Insert original reference
    end
    Workspace-->>Client: Rendered additions/removals
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • jhweir

Poem

🐰 Hopping through chunks with a keen little grin,

I stitch every piece so the whole can jump in.
I check each one twice, then I bind them as one,
Load first, then update — the careful rabbit's run.
600 links sorted, that's a carrot well-won!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main changes: P-Diff chunking improvements and a link deserialization fix, which aligns with the substantial modifications across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pdiff-link-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs (2)

550-562: Always-true predicate in retry_until_success — intentional but worth a brief comment.

The predicate |_result| true (line 557-561) means retry_until_success will accept any response, including one with 0 additions. The inline comment explains this, but the explanation is easy to miss. A shorter inline note directly at the closure would help:

         |_result: &perspective_diff_sync_integrity::PullResult| {
-            // For initial pull, the function returns early with empty diff
-            // but updates current_revision, so just check it doesn't error
-            true
+            true // Initial pull returns empty diff; real assertion is on render() below
         },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`
around lines 550 - 562, The predicate passed to retry_until_success for Bob's
pull currently returns true unconditionally (closure |_:
&perspective_diff_sync_integrity::PullResult| true), which is intentional but
easy to miss; update the closure to include a concise inline comment like "//
accept any non-error result since initial pull only updates current_revision" or
expand the existing comment directly above the closure to explicitly state this
rationale, referencing retry_until_success, the closure, and PullResult so
future readers understand why we accept an empty diff.

175-181: Duplicate retry pattern — consider extracting a helper.

This 5-iteration exchange_peer_info + sleep loop is identical to lines 72-75. A small helper like ensure_peer_connectivity(&conductors).await would reduce duplication and make intent clearer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`
around lines 175 - 181, The retry loop calling
conductors.exchange_peer_info().await with a sleep is duplicated; extract it
into a small async helper like ensure_peer_connectivity(conductors:
&Conductors).await (or ensure_peer_connectivity(&conductors).await) that runs
the 5-iteration loop and sleeps 2s between attempts, place the helper in the
test module (or a shared test utils area) and replace both occurrences of the
explicit for-loop with a single call to
ensure_peer_connectivity(&conductors).await to remove duplication and clarify
intent.
bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/lib.rs (1)

210-216: Test-only helper is exposed in production builds.

update_current_revision (line 91) is gated behind #[cfg(feature = "test")], but this new get_diff_entry_reference is unconditionally compiled as an #[hdk_extern]. If it's truly only needed for testing, consider gating it similarly to avoid expanding the production zome API surface.

Proposed fix
 /// Helper function for testing - get a PerspectiveDiffEntryReference by hash
+#[cfg(feature = "test")]
 #[hdk_extern]
 pub fn get_diff_entry_reference(hash: Hash) -> ExternResult<PerspectiveDiffEntryReference> {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/lib.rs`
around lines 210 - 216, The helper function get_diff_entry_reference is exposed
unconditionally as an #[hdk_extern] but is only needed for tests; gate it like
update_current_revision by adding #[cfg(feature = "test")] above the function
(and/or use #[cfg_attr(feature = "test", hdk_extern)] if you need the attribute
only when the feature is enabled) so PerspectiveDiffEntryReference and
retriever::HolochainRetreiver::get remain test-only and the production zome API
surface isn't expanded.
bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/workspace.rs (1)

783-801: squashed_diff still used by build_diffs path — verify chunks are loadable.

In the build_diffscollect_until_common_ancestorsort_graph flow, entry_map gets populated from self.diffs (line 294) which contains raw/unresolved entries. squashed_diff then calls load_diff_from_entry::<Retriever> on each, so chunks are resolved at read time here. This is consistent but different from the eager resolution in handle_parents.

Consider whether the build_diffs path should also eagerly resolve (for consistency), or whether you intentionally keep the two paths different. Currently it works — just worth documenting the asymmetry.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/workspace.rs`
around lines 783 - 801, squashed_diff iterates entry_map and resolves chunks
lazily via load_diff_from_entry, which is asymmetric with handle_parents' eager
resolution; update the build_diffs → collect_until_common_ancestor → sort_graph
flow so that when populating self.entry_map from self.diffs it eagerly resolves
entries (call load_diff_from_entry::<Retriever> while building entry_map) to
match handle_parents, or alternatively add a clear comment in the build_diffs
path and a unit/integration test to assert load_diff_from_entry can load chunked
entries at read time; reference functions/fields: squashed_diff, build_diffs,
collect_until_common_ancestor, sort_graph, entry_map, self.diffs,
load_diff_from_entry, handle_parents.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`:
- Around line 286-441: The test doesn't reliably exercise the chunk-loading
failure path because it starts connected (setup_conductors), uses pull() which
often succeeds, and only conditionally asserts; change it to deterministically
trigger the failure by either starting conductors disconnected (use
setup_conductors(2, false)) and invoking handle_broadcast() directly with a
crafted PerspectiveDiffEntryReference from create_commit_input_multi, or inject
a test-only failure hook into load_diff_from_entry() (via #[cfg(test)] or a test
feature) and keep calling pull() with call_zome_fallible to force an error; also
remove or convert the long println! block explaining the fix into a short doc
comment or test comment to keep the test focused; references: setup_conductors,
create_commit_input_multi, call_zome_fallible, pull, handle_broadcast,
load_diff_from_entry, current_revision.

In
`@bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/commit.rs`:
- Around line 90-94: The retry check in commit.rs uses `if retry_count >
MAX_RETRIES` but since `retry_count` is incremented before this check, that
allows one extra attempt; change the condition to `if retry_count >=
MAX_RETRIES` (in the block guarding chunk verification that returns
`SocialContextError::InternalError("Failed to verify chunk availability after
creation")`) so the loop enforces the intended max retries for `retry_count` vs
`MAX_RETRIES`.

---

Nitpick comments:
In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`:
- Around line 550-562: The predicate passed to retry_until_success for Bob's
pull currently returns true unconditionally (closure |_:
&perspective_diff_sync_integrity::PullResult| true), which is intentional but
easy to miss; update the closure to include a concise inline comment like "//
accept any non-error result since initial pull only updates current_revision" or
expand the existing comment directly above the closure to explicitly state this
rationale, referencing retry_until_success, the closure, and PullResult so
future readers understand why we accept an empty diff.
- Around line 175-181: The retry loop calling
conductors.exchange_peer_info().await with a sleep is duplicated; extract it
into a small async helper like ensure_peer_connectivity(conductors:
&Conductors).await (or ensure_peer_connectivity(&conductors).await) that runs
the 5-iteration loop and sleeps 2s between attempts, place the helper in the
test module (or a shared test utils area) and replace both occurrences of the
explicit for-loop with a single call to
ensure_peer_connectivity(&conductors).await to remove duplication and clarify
intent.

In
`@bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/lib.rs`:
- Around line 210-216: The helper function get_diff_entry_reference is exposed
unconditionally as an #[hdk_extern] but is only needed for tests; gate it like
update_current_revision by adding #[cfg(feature = "test")] above the function
(and/or use #[cfg_attr(feature = "test", hdk_extern)] if you need the attribute
only when the feature is enabled) so PerspectiveDiffEntryReference and
retriever::HolochainRetreiver::get remain test-only and the production zome API
surface isn't expanded.

In
`@bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/workspace.rs`:
- Around line 783-801: squashed_diff iterates entry_map and resolves chunks
lazily via load_diff_from_entry, which is asymmetric with handle_parents' eager
resolution; update the build_diffs → collect_until_common_ancestor → sort_graph
flow so that when populating self.entry_map from self.diffs it eagerly resolves
entries (call load_diff_from_entry::<Retriever> while building entry_map) to
match handle_parents, or alternatively add a clear comment in the build_diffs
path and a unit/integration test to assert load_diff_from_entry can load chunked
entries at read time; reference functions/fields: squashed_diff, build_diffs,
collect_until_common_ancestor, sort_graph, entry_map, self.diffs,
load_diff_from_entry, handle_parents.

lucksus
lucksus previously approved these changes Feb 17, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs (2)

499-511: Always-true predicate makes retry_until_success a no-op filter.

The predicate |_result| true means the retry only guards against call errors, never against an unsatisfactory result. This works for the stated purpose but is somewhat misleading — a reader might expect the predicate to validate something meaningful.

Consider using call_zome_fallible with a manual retry loop instead, or adding a brief comment next to the predicate explaining the intent more explicitly (e.g., // We only care that the call succeeds; diff content is verified via render() below).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`
around lines 499 - 511, The predicate passed to retry_until_success (used when
obtaining bob_pull) is always true, making the retry not validate the call
result; either replace retry_until_success with a manual retry using
call_zome_fallible and validate the PullResult, or keep retry_until_success but
replace the closure with an explanatory comment next to the predicate (e.g.,
near bob_pull/retry_until_success) that states we only care the call succeeds
and that the actual diff content is asserted later via render(); reference
retry_until_success, bob_pull, call_zome_fallible, and render() so the change is
easy to locate.

68-75: Redundant exchange_peer_info call on line 68.

exchange_peer_info() is called once on line 68, then immediately again 5 more times in the retry loop starting at line 72. The first standalone call is unnecessary since the loop covers it.

The same pattern appears in test_merge_fetch_deep at lines 174–181.

Suggested fix
-    conductors.exchange_peer_info().await;
-
     // Retry connection until agents can see each other
     // This is needed for the development version of Holochain
     for _attempt in 0..5 {
         conductors.exchange_peer_info().await;
         tokio::time::sleep(tokio::time::Duration::from_millis(2000)).await;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`
around lines 68 - 75, The standalone call to
conductors.exchange_peer_info().await is redundant because the subsequent
for-loop already performs the same call multiple times; remove the extra call
(the one before the retry loop) in both occurrences (the current test and the
test_merge_fetch_deep function) so only the retry loop containing
conductors.exchange_peer_info().await runs. Ensure you update both locations to
avoid duplicate peer-exchange calls and keep the retry loop intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`:
- Around line 499-511: The predicate passed to retry_until_success (used when
obtaining bob_pull) is always true, making the retry not validate the call
result; either replace retry_until_success with a manual retry using
call_zome_fallible and validate the PullResult, or keep retry_until_success but
replace the closure with an explanatory comment next to the predicate (e.g.,
near bob_pull/retry_until_success) that states we only care the call succeeds
and that the actual diff content is asserted later via render(); reference
retry_until_success, bob_pull, call_zome_fallible, and render() so the change is
easy to locate.
- Around line 68-75: The standalone call to
conductors.exchange_peer_info().await is redundant because the subsequent
for-loop already performs the same call multiple times; remove the extra call
(the one before the retry loop) in both occurrences (the current test and the
test_merge_fetch_deep function) so only the retry loop containing
conductors.exchange_peer_info().await runs. Ensure you update both locations to
avoid duplicate peer-exchange calls and keep the retry loop intact.

@lucksus lucksus merged commit f1fba13 into dev Feb 17, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants