Skip to content

Chunk regular diffs to avoid HC entry size limit#655

Merged
lucksus merged 4 commits intodevfrom
fix/hc-entry-size-limit
Feb 10, 2026
Merged

Chunk regular diffs to avoid HC entry size limit#655
lucksus merged 4 commits intodevfrom
fix/hc-entry-size-limit

Conversation

@lucksus
Copy link
Member

@lucksus lucksus commented Feb 6, 2026

And migrate old tryorama tests to sweetest and add test to diff chunking.

We already had chunking logic but only used it for snapshot entries. This applies chunking to regular diffs if they get too big (>500 links)


Note

Medium Risk
Changes core diff persistence/rehydration paths (commit/pull/snapshot traversal) to store large diffs across multiple entries, which could impact sync correctness and backward compatibility if edge cases aren’t covered.

Overview
Adds chunked storage for large non-snapshot diffs to avoid Holochain entry size limits: commit now splits diffs over a threshold into multiple PerspectiveDiffEntryReference chunk entries and stores only chunk hashes in the main revision entry, while readers use a new load_diff_from_entry helper to transparently rehydrate inline vs chunked diffs during pull, broadcast handling, workspace squashing, and snapshot generation.

Introduces/extends integration coverage by migrating Tryorama tests to Rust sweettest and adding a large-diff commit test, plus updates the DNA workspace/build packaging to include the new sweettest crate while building/packing only the WASM zomes.

Written by Cursor Bugbot for commit 6dce6a9. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

  • Tests

    • Added a comprehensive Rust integration test suite covering commit/pull synchronization, rendering (including snapshot and removal cases), revision tracking, telepresence/online-status discovery, and large-diff scenarios.
  • Chores

    • Added chunked storage handling for large diffs to improve sync reliability and storage scalability.
    • Adjusted build packaging to target the core WASM zomes.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

📝 Walkthrough

Walkthrough

Adds a new Sweettest Rust integration test suite and test utilities, registers it in the workspace; implements chunked-diff storage for large PerspectiveDiffs with loader support and updates commit/pull/snapshot/workspace code paths; narrows build script WASM targets.

Changes

Cohort / File(s) Summary
Workspace config
bootstrap-languages/p-diff-sync/hc-dna/Cargo.toml
Adds tests/sweettest to workspace members.
Test crate & entrypoints
bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/Cargo.toml, .../tests/sweettest/src/lib.rs, .../tests/sweettest/src/integration_tests.rs
New Sweettest crate manifest and test harness exposing test modules.
Test modules
.../tests/sweettest/src/test_commit_pull.rs, .../test_render.rs, .../test_revisions.rs, .../test_telepresence.rs
Adds comprehensive async integration tests covering commit/pull merges, rendering, revisions, telepresence, large-diff chunking and edge cases.
Test utilities
.../tests/sweettest/src/utils.rs
New public helpers for DNA loading, conductor setup, commit/pull helpers, zome call wrappers, retry/consistency utilities, data builders used across tests.
Chunked diff loader
.../zomes/.../src/link_adapter/chunked_diffs.rs
Adds load_diff_from_entry() to load diffs either inline or by aggregating chunk entries.
Commit chunking
.../zomes/.../src/link_adapter/commit.rs
Implements CHUNKING_THRESHOLD (500) and chunked-storage pathway: split, persist chunk entries, store references in diff_chunks; retains inline path for small diffs.
Pull / merge / broadcast updates
.../zomes/.../src/link_adapter/pull.rs
Replaces direct diff access with load_diff_from_entry() in pull/merge/broadcast code paths; propagates errors from loader.
Snapshots & workspace
.../zomes/.../src/link_adapter/snapshots.rs, .../workspace.rs
Uses loader to access additions/removals so snapshots and workspace aggregation handle both inline and chunked diffs.
Integrity type changes
.../zomes/perspective_diff_sync_integrity/src/lib.rs
Adds public diff_chunks: Option<Vec<HoloHash<...>>>, new_chunked() constructor and is_chunked() helper on PerspectiveDiffEntryReference.
Build script
bootstrap-languages/p-diff-sync/hc-dna/build.sh
Restricts WASM build to specific zome crates (perspective_diff_sync, perspective_diff_sync_integrity); notes exclusion of sweettest.

Sequence Diagram

sequenceDiagram
    participant Tester as Test/Caller
    participant Commit as commit()
    participant Store as EntryStore
    participant ChunkMgr as ChunkedDiffs
    participant Retriever as load_diff_from_entry

    Tester->>Commit: Submit PerspectiveDiff (>500)
    Commit->>Commit: Check CHUNKING_THRESHOLD
    alt Large diff ( > 500 )
        Commit->>ChunkMgr: Split into chunks
        ChunkMgr->>Store: Persist chunk entries
        Store-->>ChunkMgr: Return chunk hashes
        Commit->>Store: Persist main entry (diff empty, diff_chunks=[hashes])
        Store-->>Commit: Return entry reference
    else Small diff (<=500)
        Commit->>Store: Persist main entry (inline diff)
        Store-->>Commit: Return entry reference
    end
    Commit-->>Tester: Return entry reference

    Note over Tester,Retriever: Later on pull/merge/render
    Tester->>Retriever: load_diff_from_entry(entry_ref)
    Retriever->>Retriever: Check is_chunked()
    alt entry_ref has diff_chunks
        Retriever->>Store: Load chunk entries by hashes
        Store-->>Retriever: Return chunk data
        Retriever->>ChunkMgr: Aggregate chunks into PerspectiveDiff
        ChunkMgr-->>Retriever: Return aggregated diff
    else inline diff
        Retriever-->>Retriever: Use inline diff from entry_ref
    end
    Retriever-->>Tester: Return unified PerspectiveDiff
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • jhweir

Poem

🐰✨ I hop and split each giant diff,
into neat chunks, a careful gift.
Sweettests hum and conductors play,
merges meet and find their way.
Tiny hops, big syncs — hip hooray! 🎉

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Chunk regular diffs to avoid HC entry size limit' clearly and concisely describes the main technical change: implementing chunking for regular diffs to handle Holochain entry size limitations. It accurately reflects the primary objective and is specific enough for scanning code history.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/hc-entry-size-limit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync_integrity/src/lib.rs (1)

256-263: ⚠️ Potential issue | 🟡 Minor

Ord / PartialEq inconsistency: comparison_key excludes diff_chunks.

PartialEq and Hash are derived (use all fields), but the manual Ord impl via comparison_key doesn't include diff_chunks. This violates the contract that a == b ⟹ a.cmp(&b) == Equal, which can cause subtle bugs in sorted collections like BTreeSet.

In practice the risk is low (entries are typically keyed by action hash), but worth being aware of if PerspectiveDiffEntryReference values are ever inserted directly into a BTreeSet/BTreeMap.

🔧 Include diff_chunks in comparison_key
-    fn comparison_key(&self) -> (bool, &Option<Vec<HoloHash<holo_hash::hash_type::Action>>>, usize, usize, &PerspectiveDiff) {
-        let has_parents = self.parents.is_some();
-        (!has_parents, &self.parents, self.diffs_since_snapshot, self.diff.total_diff_number(), &self.diff)
+    fn comparison_key(&self) -> (bool, &Option<Vec<HoloHash<holo_hash::hash_type::Action>>>, usize, usize, &PerspectiveDiff, &Option<Vec<HoloHash<holo_hash::hash_type::Action>>>) {
+        let has_parents = self.parents.is_some();
+        (!has_parents, &self.parents, self.diffs_since_snapshot, self.diff.total_diff_number(), &self.diff, &self.diff_chunks)
     }
🤖 Fix all issues with AI agents
In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`:
- Around line 244-258: The test named test_empty_commit is misleading because it
uses create_did_link and commit_link and thus makes a real commit; either rename
the test to reflect its behavior (e.g., test_single_commit_has_revision) or
modify the test to actually perform an empty-diff commit: update the test
function test_empty_commit to call the code path that creates an empty commit
instead of commit_link (or add a new helper like commit_empty_diff) and then
assert the revision existence; locate references to test_empty_commit,
commit_link, and create_did_link in the test to implement the rename or replace
the commit step accordingly.
🧹 Nitpick comments (8)
bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_telepresence.rs (1)

54-58: Most telepresence tests lack assertions — they only println! results.

test_online_status, test_get_online_agents_multi, test_get_active_agents, and test_get_others all retrieve data but never assert on it. These are smoke tests that only prove "no panic," not correctness. Regressions in telepresence behavior will go undetected.

Consider using retry_until_success (already available in utils) to poll until the expected state is reached, then assert. For example, test_online_status could retry get_online_agents until the list is non-empty, then assert the agent's DID matches.

bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_render.rs (1)

96-114: Consider extracting the manual retry loop or adding a brief comment.

This hand-rolled retry loop duplicates retry logic from retry_until_success in utils.rs. The nuance here is that you're checking a side-effect (current_revision) rather than the pull result itself, which justifies deviating from the utility. A short inline comment explaining that would help future readers understand why retry_until_success isn't used.

bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs (1)

218-242: The chunking test only verifies the commit hash, not data round-trip integrity.

This test is the key validation for the PR's chunking feature. It proves the 600-link commit doesn't blow the entry size limit, which is good. However, it doesn't verify that the 600 links survive the chunk/reassemble round-trip — a render call asserting 600 links would close that gap and catch potential data loss in chunk boundaries.

Suggested addition before the shutdown
     assert_eq!(current.unwrap(), commit_result, "Current revision should match commit");
 
+    // Verify all links survive the chunk/reassemble round-trip
+    let rendered: perspective_diff_sync_integrity::Perspective =
+        call_zome(&conductor, &cell, "render", ()).await;
+    assert_eq!(rendered.links.len(), 600, "All 600 links should be retrievable after chunked commit");
+
     conductor.shutdown().await;
bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/Cargo.toml (1)

35-37: Remove the unused pretty_assertions dev-dependency.

pretty_assertions is not imported or used anywhere in this test crate. Removing it from Cargo.toml will reduce unnecessary compile time.

bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/utils.rs (1)

147-176: Unnecessary sleep after the final failed attempt.

The tokio::time::sleep on line 173 runs even on the last iteration (when attempt == max_retries) right before returning Err. Not a real problem for tests, but you could short-circuit:

♻️ Skip sleep on the last attempt
             Err(e) => {
                 println!("Attempt {}/{}: call failed with {:?}, retrying...", attempt, max_retries, e);
             }
         }
-        tokio::time::sleep(tokio::time::Duration::from_millis(delay_ms)).await;
+        if attempt < max_retries {
+            tokio::time::sleep(tokio::time::Duration::from_millis(delay_ms)).await;
+        }
     }
bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/chunked_diffs.rs (1)

84-92: into_aggregated_diff unnecessarily clones on every iteration.

Now that load_diff_from_entry calls this on potentially large diffs (the primary use case of this PR), the repeated accum.clone() in the reduce closure is O(n²) in the number of chunks. Since into_iter() yields owned values, you can mutate accum in-place:

♻️ Avoid cloning in reduce
     pub fn into_aggregated_diff(self) -> PerspectiveDiff {
-        self.chunks.into_iter().reduce(|accum, item| {
-            let mut temp = accum.clone();
-            temp.additions.append(&mut item.additions.clone());
-            temp.removals.append(&mut item.removals.clone());
-            temp
+        self.chunks.into_iter().reduce(|mut accum, mut item| {
+            accum.additions.append(&mut item.additions);
+            accum.removals.append(&mut item.removals);
+            accum
         })
         .unwrap_or(PerspectiveDiff::new())
     }
bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/commit.rs (1)

53-91: Chunking logic is correct. One minor note: new_chunked() was added to PerspectiveDiffEntryReference in lib.rs but isn't used here — direct struct initialization is used instead. Consider using the constructor for consistency.

Also, in the chunked branch (lines 62-63), diff.additions.clone() and diff.removals.clone() are unnecessary since diff is not used again in that arm. You can avoid cloning the (large) vectors:

♻️ Avoid cloning large vectors in chunked path
         // Create chunked diff entries
         let mut chunked_diffs = ChunkedDiffs::new(*CHUNK_SIZE);
-        chunked_diffs.add_additions(diff.additions.clone());
-        chunked_diffs.add_removals(diff.removals.clone());
+        chunked_diffs.add_additions(diff.additions);
+        chunked_diffs.add_removals(diff.removals);
bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync_integrity/src/lib.rs (1)

232-249: new_chunked and is_chunked look correct.

Note that new_chunked is currently unused — commit.rs builds the struct directly. Consider using this constructor there for consistency (see comment on commit.rs).

Comment on lines +244 to +258
/// Test commit with empty diff
#[tokio::test(flavor = "multi_thread")]
async fn test_empty_commit() {
let (mut conductor, cell) = setup_1_conductor().await;

create_did_link(&conductor, &cell, "did:test:alice").await;

// First make a real commit
commit_link(&conductor, &cell, "alice").await;

// Get current revision
let rev1: Option<ActionHash> = call_zome(&conductor, &cell, "current_revision", ()).await;
assert!(rev1.is_some());

conductor.shutdown().await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test name test_empty_commit is misleading — it never commits an empty diff.

The function creates a regular link via commit_link, then asserts a revision exists. It doesn't test empty-diff behavior. Either rename to reflect actual intent (e.g., test_single_commit_has_revision) or commit with an empty diff to match the name.

🤖 Prompt for AI Agents
In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`
around lines 244 - 258, The test named test_empty_commit is misleading because
it uses create_did_link and commit_link and thus makes a real commit; either
rename the test to reflect its behavior (e.g., test_single_commit_has_revision)
or modify the test to actually perform an empty-diff commit: update the test
function test_empty_commit to call the code path that creates an empty commit
instead of commit_link (or add a new helper like commit_empty_diff) and then
assert the revision existence; locate references to test_empty_commit,
commit_link, and create_did_link in the test to implement the rename or replace
the commit step accordingly.

data-coasys added a commit that referenced this pull request Feb 10, 2026
Replace Tryorama TypeScript tests with Rust-based sweettest for stability.
The old tests failed due to ESM/CommonJS compatibility issues with Node 18.

Sweettest tests cover:
- Status get/set functionality
- Inbox message sending and receiving
- Access control (only recipient can fetch inbox)
- Inbox filtering by DID

This follows the same pattern as PR #655 (chunk-diffs) which migrates
p-diff-sync tests to sweettest.
@lucksus lucksus merged commit fdba6c0 into dev Feb 10, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant