Chunk regular diffs to avoid HC entry size limit#655
Conversation
📝 WalkthroughWalkthroughAdds a new Sweettest Rust integration test suite and test utilities, registers it in the workspace; implements chunked-diff storage for large PerspectiveDiffs with loader support and updates commit/pull/snapshot/workspace code paths; narrows build script WASM targets. Changes
Sequence DiagramsequenceDiagram
participant Tester as Test/Caller
participant Commit as commit()
participant Store as EntryStore
participant ChunkMgr as ChunkedDiffs
participant Retriever as load_diff_from_entry
Tester->>Commit: Submit PerspectiveDiff (>500)
Commit->>Commit: Check CHUNKING_THRESHOLD
alt Large diff ( > 500 )
Commit->>ChunkMgr: Split into chunks
ChunkMgr->>Store: Persist chunk entries
Store-->>ChunkMgr: Return chunk hashes
Commit->>Store: Persist main entry (diff empty, diff_chunks=[hashes])
Store-->>Commit: Return entry reference
else Small diff (<=500)
Commit->>Store: Persist main entry (inline diff)
Store-->>Commit: Return entry reference
end
Commit-->>Tester: Return entry reference
Note over Tester,Retriever: Later on pull/merge/render
Tester->>Retriever: load_diff_from_entry(entry_ref)
Retriever->>Retriever: Check is_chunked()
alt entry_ref has diff_chunks
Retriever->>Store: Load chunk entries by hashes
Store-->>Retriever: Return chunk data
Retriever->>ChunkMgr: Aggregate chunks into PerspectiveDiff
ChunkMgr-->>Retriever: Return aggregated diff
else inline diff
Retriever-->>Retriever: Use inline diff from entry_ref
end
Retriever-->>Tester: Return unified PerspectiveDiff
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync_integrity/src/lib.rs (1)
256-263:⚠️ Potential issue | 🟡 Minor
Ord/PartialEqinconsistency:comparison_keyexcludesdiff_chunks.
PartialEqandHashare derived (use all fields), but the manualOrdimpl viacomparison_keydoesn't includediff_chunks. This violates the contract thata == b ⟹ a.cmp(&b) == Equal, which can cause subtle bugs in sorted collections likeBTreeSet.In practice the risk is low (entries are typically keyed by action hash), but worth being aware of if
PerspectiveDiffEntryReferencevalues are ever inserted directly into aBTreeSet/BTreeMap.🔧 Include diff_chunks in comparison_key
- fn comparison_key(&self) -> (bool, &Option<Vec<HoloHash<holo_hash::hash_type::Action>>>, usize, usize, &PerspectiveDiff) { - let has_parents = self.parents.is_some(); - (!has_parents, &self.parents, self.diffs_since_snapshot, self.diff.total_diff_number(), &self.diff) + fn comparison_key(&self) -> (bool, &Option<Vec<HoloHash<holo_hash::hash_type::Action>>>, usize, usize, &PerspectiveDiff, &Option<Vec<HoloHash<holo_hash::hash_type::Action>>>) { + let has_parents = self.parents.is_some(); + (!has_parents, &self.parents, self.diffs_since_snapshot, self.diff.total_diff_number(), &self.diff, &self.diff_chunks) }
🤖 Fix all issues with AI agents
In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`:
- Around line 244-258: The test named test_empty_commit is misleading because it
uses create_did_link and commit_link and thus makes a real commit; either rename
the test to reflect its behavior (e.g., test_single_commit_has_revision) or
modify the test to actually perform an empty-diff commit: update the test
function test_empty_commit to call the code path that creates an empty commit
instead of commit_link (or add a new helper like commit_empty_diff) and then
assert the revision existence; locate references to test_empty_commit,
commit_link, and create_did_link in the test to implement the rename or replace
the commit step accordingly.
🧹 Nitpick comments (8)
bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_telepresence.rs (1)
54-58: Most telepresence tests lack assertions — they onlyprintln!results.
test_online_status,test_get_online_agents_multi,test_get_active_agents, andtest_get_othersall retrieve data but never assert on it. These are smoke tests that only prove "no panic," not correctness. Regressions in telepresence behavior will go undetected.Consider using
retry_until_success(already available inutils) to poll until the expected state is reached, then assert. For example,test_online_statuscould retryget_online_agentsuntil the list is non-empty, then assert the agent's DID matches.bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_render.rs (1)
96-114: Consider extracting the manual retry loop or adding a brief comment.This hand-rolled retry loop duplicates retry logic from
retry_until_successinutils.rs. The nuance here is that you're checking a side-effect (current_revision) rather than the pull result itself, which justifies deviating from the utility. A short inline comment explaining that would help future readers understand whyretry_until_successisn't used.bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs (1)
218-242: The chunking test only verifies the commit hash, not data round-trip integrity.This test is the key validation for the PR's chunking feature. It proves the 600-link commit doesn't blow the entry size limit, which is good. However, it doesn't verify that the 600 links survive the chunk/reassemble round-trip — a
rendercall asserting 600 links would close that gap and catch potential data loss in chunk boundaries.Suggested addition before the shutdown
assert_eq!(current.unwrap(), commit_result, "Current revision should match commit"); + // Verify all links survive the chunk/reassemble round-trip + let rendered: perspective_diff_sync_integrity::Perspective = + call_zome(&conductor, &cell, "render", ()).await; + assert_eq!(rendered.links.len(), 600, "All 600 links should be retrievable after chunked commit"); + conductor.shutdown().await;bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/Cargo.toml (1)
35-37: Remove the unusedpretty_assertionsdev-dependency.
pretty_assertionsis not imported or used anywhere in this test crate. Removing it from Cargo.toml will reduce unnecessary compile time.bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/utils.rs (1)
147-176: Unnecessary sleep after the final failed attempt.The
tokio::time::sleepon line 173 runs even on the last iteration (whenattempt == max_retries) right before returningErr. Not a real problem for tests, but you could short-circuit:♻️ Skip sleep on the last attempt
Err(e) => { println!("Attempt {}/{}: call failed with {:?}, retrying...", attempt, max_retries, e); } } - tokio::time::sleep(tokio::time::Duration::from_millis(delay_ms)).await; + if attempt < max_retries { + tokio::time::sleep(tokio::time::Duration::from_millis(delay_ms)).await; + } }bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/chunked_diffs.rs (1)
84-92:into_aggregated_diffunnecessarily clones on every iteration.Now that
load_diff_from_entrycalls this on potentially large diffs (the primary use case of this PR), the repeatedaccum.clone()in thereduceclosure is O(n²) in the number of chunks. Sinceinto_iter()yields owned values, you can mutateaccumin-place:♻️ Avoid cloning in reduce
pub fn into_aggregated_diff(self) -> PerspectiveDiff { - self.chunks.into_iter().reduce(|accum, item| { - let mut temp = accum.clone(); - temp.additions.append(&mut item.additions.clone()); - temp.removals.append(&mut item.removals.clone()); - temp + self.chunks.into_iter().reduce(|mut accum, mut item| { + accum.additions.append(&mut item.additions); + accum.removals.append(&mut item.removals); + accum }) .unwrap_or(PerspectiveDiff::new()) }bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync/src/link_adapter/commit.rs (1)
53-91: Chunking logic is correct. One minor note:new_chunked()was added toPerspectiveDiffEntryReferenceinlib.rsbut isn't used here — direct struct initialization is used instead. Consider using the constructor for consistency.Also, in the chunked branch (lines 62-63),
diff.additions.clone()anddiff.removals.clone()are unnecessary sincediffis not used again in that arm. You can avoid cloning the (large) vectors:♻️ Avoid cloning large vectors in chunked path
// Create chunked diff entries let mut chunked_diffs = ChunkedDiffs::new(*CHUNK_SIZE); - chunked_diffs.add_additions(diff.additions.clone()); - chunked_diffs.add_removals(diff.removals.clone()); + chunked_diffs.add_additions(diff.additions); + chunked_diffs.add_removals(diff.removals);bootstrap-languages/p-diff-sync/hc-dna/zomes/perspective_diff_sync_integrity/src/lib.rs (1)
232-249:new_chunkedandis_chunkedlook correct.Note that
new_chunkedis currently unused —commit.rsbuilds the struct directly. Consider using this constructor there for consistency (see comment oncommit.rs).
| /// Test commit with empty diff | ||
| #[tokio::test(flavor = "multi_thread")] | ||
| async fn test_empty_commit() { | ||
| let (mut conductor, cell) = setup_1_conductor().await; | ||
|
|
||
| create_did_link(&conductor, &cell, "did:test:alice").await; | ||
|
|
||
| // First make a real commit | ||
| commit_link(&conductor, &cell, "alice").await; | ||
|
|
||
| // Get current revision | ||
| let rev1: Option<ActionHash> = call_zome(&conductor, &cell, "current_revision", ()).await; | ||
| assert!(rev1.is_some()); | ||
|
|
||
| conductor.shutdown().await; |
There was a problem hiding this comment.
Test name test_empty_commit is misleading — it never commits an empty diff.
The function creates a regular link via commit_link, then asserts a revision exists. It doesn't test empty-diff behavior. Either rename to reflect actual intent (e.g., test_single_commit_has_revision) or commit with an empty diff to match the name.
🤖 Prompt for AI Agents
In
`@bootstrap-languages/p-diff-sync/hc-dna/tests/sweettest/src/test_commit_pull.rs`
around lines 244 - 258, The test named test_empty_commit is misleading because
it uses create_did_link and commit_link and thus makes a real commit; either
rename the test to reflect its behavior (e.g., test_single_commit_has_revision)
or modify the test to actually perform an empty-diff commit: update the test
function test_empty_commit to call the code path that creates an empty commit
instead of commit_link (or add a new helper like commit_empty_diff) and then
assert the revision existence; locate references to test_empty_commit,
commit_link, and create_did_link in the test to implement the rename or replace
the commit step accordingly.
Replace Tryorama TypeScript tests with Rust-based sweettest for stability. The old tests failed due to ESM/CommonJS compatibility issues with Node 18. Sweettest tests cover: - Status get/set functionality - Inbox message sending and receiving - Access control (only recipient can fetch inbox) - Inbox filtering by DID This follows the same pattern as PR #655 (chunk-diffs) which migrates p-diff-sync tests to sweettest.
And migrate old tryorama tests to sweetest and add test to diff chunking.
We already had chunking logic but only used it for snapshot entries. This applies chunking to regular diffs if they get too big (>500 links)
Note
Medium Risk
Changes core diff persistence/rehydration paths (
commit/pull/snapshot traversal) to store large diffs across multiple entries, which could impact sync correctness and backward compatibility if edge cases aren’t covered.Overview
Adds chunked storage for large non-snapshot diffs to avoid Holochain entry size limits:
commitnow splits diffs over a threshold into multiplePerspectiveDiffEntryReferencechunk entries and stores only chunk hashes in the main revision entry, while readers use a newload_diff_from_entryhelper to transparently rehydrate inline vs chunked diffs duringpull, broadcast handling, workspace squashing, and snapshot generation.Introduces/extends integration coverage by migrating Tryorama tests to Rust sweettest and adding a large-diff commit test, plus updates the DNA workspace/build packaging to include the new
sweettestcrate while building/packing only the WASM zomes.Written by Cursor Bugbot for commit 6dce6a9. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
Tests
Chores