Fix Missing Links in New Neighbourhoods + Add Fallback Sync System#625
Fix Missing Links in New Neighbourhoods + Add Fallback Sync System#625
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Pre-merge checks (1 passed, 2 warnings)❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
rust-executor/src/perspectives/perspective_instance.rs (1)
461-461: Consider more specific log levels.The logging in
ensure_public_links_are_shared()useslog::debugfor success andlog::errorfor failures. Consider usinglog::infofor successful commits with non-zero links, as this is an important operational event.- log::debug!( + log::info!( "Successfully committed {} links to link language in fallback sync", links_count );Also applies to: 475-475, 480-480
rust-executor/src/neighbourhoods.rs (1)
46-46: Consider handling the boolean return value.The
ensure_public_links_are_shared()method now returns a boolean indicating success/failure, but the return value is ignored here. Consider logging a warning if the operation fails.- perspective.ensure_public_links_are_shared().await; + if !perspective.ensure_public_links_are_shared().await { + log::warn!("Failed to ensure public links are shared during neighbourhood creation for perspective {}", uuid); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
rust-executor/src/neighbourhoods.rs(1 hunks)rust-executor/src/perspectives/perspective_instance.rs(8 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
rust-executor/src/neighbourhoods.rs (2)
rust-executor/src/db.rs (1)
update_perspective(711-723)rust-executor/src/perspectives/mod.rs (1)
update_perspective(110-140)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (8)
rust-executor/src/perspectives/perspective_instance.rs (7)
179-181: LGTM! Well-structured fallback sync tracking fields.The addition of
last_successful_fallback_syncandfallback_sync_intervalfields with proper Arc wrapping ensures thread-safe access for the fallback synchronization mechanism.Also applies to: 203-204
215-215: Good integration of the fallback sync loop.The
fallback_sync_loop()is appropriately added to the background tasks, ensuring it runs concurrently with other maintenance tasks.
426-483: Well-implemented public method for link synchronization.The refactored
ensure_public_links_are_shared()method properly:
- Returns a boolean to indicate success/failure
- Handles the case when no link language is available
- Logs the number of successfully committed links
- Maintains backward compatibility while adding error reporting
626-675: Robust fallback sync loop implementation with adaptive timing.The fallback sync loop is well-designed with:
- Proper state checks (only runs for Synced perspectives with neighbourhoods)
- Adaptive interval adjustment (30s on failure, 5 minutes on success)
- Clean async/await pattern
- Comprehensive debug logging
The adaptive timing strategy effectively balances between quick recovery from failures and reduced overhead during stable operation.
677-686: Good helper method for interval management.The
reset_fallback_sync_interval()method provides a clean way to reset the sync interval when new shared links are added, ensuring timely synchronization of new content.
927-928: Correct integration with link mutation flow.Properly resets the fallback sync interval when new shared links are added through
link_mutations(), ensuring the fallback sync runs sooner to publish these links.
247-247: Confirm redundancy of ensure_public_links_are_shared() invocation
At rust-executor/src/perspectives/perspective_instance.rs:247, the call already includes.await. If the link-sharing logic has been moved into the neighbourhood-creation flow in neighbourhoods.rs, remove this now-redundant invocation; otherwise retain it as is.rust-executor/src/neighbourhoods.rs (1)
38-46: Critical fix: Ensures early links are shared before other agents join.This is the primary fix for the issue. By:
- Setting state to
NeighbourhoodCreationInitiatedinstead ofSynced- Calling
ensure_public_links_are_shared()after neighbourhood creationThe implementation correctly ensures that any links created immediately after neighbourhood creation are committed to the link language before other agents can join, preventing the "missing links" issue.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
rust-executor/src/perspectives/perspective_instance.rs (1)
426-483: ensure_public_links_are_shared: shorten link_language lock and avoid O(n*m) scan
- Hold
link_languagelock only for render/commit; do DB + diff computation outside.- Build a HashSet of remote keys to reduce from O(n*m) to O(n).
- Make the log message generic (method may be called outside “fallback”).
Apply:
- pub async fn ensure_public_links_are_shared(&self) -> bool { - let uuid = self.persisted.lock().await.uuid.clone(); - let mut link_language_guard = self.link_language.lock().await; - if let Some(link_language) = link_language_guard.as_mut() { - let mut local_links = - Ad4mDb::with_global_instance(|db| db.get_all_links(&uuid)).unwrap(); + pub async fn ensure_public_links_are_shared(&self) -> bool { + let uuid = { self.persisted.lock().await.uuid.clone() }; + + // Gather local Shared links without holding link_language lock + let mut local_links = match Ad4mDb::with_global_instance(|db| db.get_all_links(&uuid)) { + Ok(v) => v, + Err(e) => { + log::error!("ensure_public_links_are_shared: failed to load local links: {:?}", e); + return false; + } + }; local_links.retain(|(_, status)| status == &LinkStatus::Shared); - let remote_links = match link_language.current_revision().await { - Ok(Some(_)) => { - link_language - .render() - .await - .unwrap_or(None) - .unwrap_or_default() - .links - } - _ => vec![], - }; + if local_links.is_empty() { + return true; + } + + // Snapshot remote links with a short lock + let remote_links = { + let mut guard = self.link_language.lock().await; + let Some(link_language) = guard.as_mut() else { return false }; + match link_language.current_revision().await { + Ok(Some(_)) => link_language + .render() + .await + .unwrap_or(None) + .unwrap_or_default() + .links, + _ => Vec::new(), + } + }; - let mut links_to_commit = Vec::new(); - for (local_link, _) in &local_links { - if !remote_links.iter().any(|e| { - e.author == local_link.author - && e.timestamp == local_link.timestamp - && e.data.source == local_link.data.source - && e.data.target == local_link.data.target - && e.data.predicate == local_link.data.predicate - }) { - links_to_commit.push(local_link.clone()); - } - } + // Fast set-diff + let remote_keys: std::collections::HashSet<(String, String, String, Option<String>, String)> = + remote_links + .iter() + .map(|e| { + ( + e.author.clone(), + e.timestamp.clone(), + e.data.source.clone(), + e.data.predicate.clone(), + e.data.target.clone(), + ) + }) + .collect(); + let mut links_to_commit: Vec<LinkExpression> = Vec::new(); + for (local_link, _) in local_links.into_iter() { + let key = ( + local_link.author.clone(), + local_link.timestamp.clone(), + local_link.data.source.clone(), + local_link.data.predicate.clone(), + local_link.data.target.clone(), + ); + if !remote_keys.contains(&key) { + links_to_commit.push(local_link); + } + } if !links_to_commit.is_empty() { let links_count = links_to_commit.len(); - let result = link_language - .commit(PerspectiveDiff { - additions: links_to_commit, - removals: vec![], - }) - .await; + // Re-acquire lock only for commit + let result = { + let mut guard = self.link_language.lock().await; + let Some(link_language) = guard.as_mut() else { return false }; + link_language + .commit(PerspectiveDiff { + additions: links_to_commit, + removals: vec![], + }) + .await + }; if let Err(e) = result { log::error!("Error calling link language's commit in ensure_public_links_are_shared: {:?}", e); return false; } - log::debug!( - "Successfully committed {} links to link language in fallback sync", - links_count - ); + log::debug!("Successfully committed {} link(s) via ensure_public_links_are_shared", links_count); } return true; - } - false }
♻️ Duplicate comments (1)
rust-executor/src/perspectives/perspective_instance.rs (1)
2626-2694: Bug resolved: fresh sleep interval + reduced lock contention
- Freshly reads
fallback_sync_intervalafter updates before sleeping.- Drops locks promptly while evaluating conditions.
This addresses the earlier “stale sleep interval and lock contention” feedback.
🧹 Nitpick comments (7)
rust-executor/src/perspectives/perspective_instance.rs (7)
179-182: Fallback sync tracking fields: OK; consider read-optimized locksThese are read-heavy; consider RwLock for
fallback_sync_intervalandlast_successful_fallback_syncto reduce contention. Also consider storingInstantasOption<Instant>behind RwLock for cheaper reads.
426-483: Avoid duplicate commits from concurrent callersIf
ensure_public_links_are_sharedcan be invoked concurrently (e.g., during install and fallback), add a lightweight mutex/flag to serialize this method per perspective to prevent double commits/races.
820-823: Reset fallback interval when Shared links are added via add_link_expressionTo fast-track recovery if the immediate commit fails, also reset here (mirrors
link_mutations).if status == LinkStatus::Shared { - self.spawn_commit_and_handle_error(&diff); + self.spawn_commit_and_handle_error(&diff); + self.reset_fallback_sync_interval().await; }
874-876: Reset fallback interval on add_links Shared pathSame rationale as above.
if status == LinkStatus::Shared { - self.spawn_commit_and_handle_error(&perspective_diff); + self.spawn_commit_and_handle_error(&perspective_diff); + self.reset_fallback_sync_interval().await; }
1003-1005: Reset fallback interval on Shared update_linkIf the updated link remains Shared, reset to accelerate fallback in case of transient commit failure.
if link_status == LinkStatus::Shared { - self.spawn_commit_and_handle_error(&diff); + self.spawn_commit_and_handle_error(&diff); + self.reset_fallback_sync_interval().await; }
244-248: Use the boolean from ensure_public_links_are_shared()You ignore the success flag here; log or act on failure to aid diagnostics.
- if self.persisted.lock().await.state == PerspectiveState::NeighbourhoodCreationInitiated { - self.ensure_public_links_are_shared().await; - } + if self.persisted.lock().await.state == PerspectiveState::NeighbourhoodCreationInitiated { + if !self.ensure_public_links_are_shared().await { + log::warn!("ensure_public_links_are_shared() returned false during link language install"); + } + }
324-341: Nit: log messages don’t match constants
- MAX_PENDING_SECONDS is 3, but log says “10s”.
- MAX_PENDING_DIFFS_COUNT is 150, log says “100”.
- log::info!("Committed diffs after reaching 10s maximum wait time"); + log::info!("Committed diffs after reaching {}s maximum wait time", MAX_PENDING_SECONDS); ... - log::info!("Committed diffs after collecting 100"); + log::info!("Committed diffs after collecting {}", MAX_PENDING_DIFFS_COUNT);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
rust-executor/src/perspectives/perspective_instance.rs(8 hunks)
🔇 Additional comments (3)
rust-executor/src/perspectives/perspective_instance.rs (3)
202-205: Constructor initializes fallback timers correctlyDefaults (None/30s) match the described behavior.
214-216: Background tasks wiring looks goodNew loops are spawned; ordering is fine.
2696-2706: Reset helper looks goodSimple, side-effect-only, with helpful debug log.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/js/tests/neighbourhood.ts (1)
170-175: Consider extracting timing constants.The increased timing values (15s initial sync, 180 max tries) are hardcoded throughout the test. Consider extracting these as constants at the top of the file for easier maintenance and consistency.
+ const INITIAL_SYNC_WAIT_MS = 15000; + const MAX_SYNC_RETRIES = 180; + const RETRY_INTERVAL_MS = 1000; // ... in the test - console.log("wait 15s for initial sync") - await sleep(15000) + console.log(`wait ${INITIAL_SYNC_WAIT_MS/1000}s for initial sync`) + await sleep(INITIAL_SYNC_WAIT_MS) let bobLinks = await bob.perspective.queryLinks(bobP1!.uuid, new LinkQuery({source: 'root'})) let tries = 1 - const maxTries = 180 // 3 minutes with 1 second sleep (increased for fallback sync) + const maxTries = MAX_SYNC_RETRIES // 3 minutes with 1 second sleep (increased for fallback sync)rust-executor/src/perspectives/perspective_instance.rs (1)
2631-2699: Consider optimizing lock acquisition patterns.The fallback sync loop has multiple sequential lock acquisitions that could be optimized. Consider combining some of the checks to reduce lock contention.
async fn fallback_sync_loop(&self) { let uuid = self.persisted.lock().await.uuid.clone(); log::debug!("Starting fallback sync loop for perspective {}", uuid); while !*self.is_teardown.lock().await { // Check if we should run the fallback sync (avoid holding multiple locks) let should_run = { - // Check perspective state first - let is_synced_neighbourhood = { - let handle = self.persisted.lock().await; - let result = - handle.state == PerspectiveState::Synced && handle.neighbourhood.is_some(); - drop(handle); // Release lock immediately - result - }; - - if !is_synced_neighbourhood { - false - } else { - // Check link language availability - let link_lang_available = { - let link_lang = self.link_language.lock().await; - let result = link_lang.is_some(); - drop(link_lang); // Release lock immediately - result - }; - - if !link_lang_available { - false - } else { - // Check timing conditions - let last_success = *self.last_successful_fallback_sync.lock().await; - let current_interval = *self.fallback_sync_interval.lock().await; - - // Only run if we haven't had a successful sync recently or it's been a while - last_success.is_none() || last_success.unwrap().elapsed() > current_interval - } - } + // Combine state and link language checks + let (is_synced_neighbourhood, has_link_language) = { + let handle = self.persisted.lock().await; + let link_lang = self.link_language.lock().await; + ( + handle.state == PerspectiveState::Synced && handle.neighbourhood.is_some(), + link_lang.is_some() + ) + }; + + if !is_synced_neighbourhood || !has_link_language { + false + } else { + // Check timing conditions + let last_success = *self.last_successful_fallback_sync.lock().await; + let current_interval = *self.fallback_sync_interval.lock().await; + + // Only run if we haven't had a successful sync recently or it's been a while + last_success.is_none() || last_success.unwrap().elapsed() > current_interval + } };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
rust-executor/src/perspectives/perspective_instance.rs(9 hunks)tests/js/tests/neighbourhood.ts(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/js/tests/neighbourhood.ts (1)
test-runner/example/languages/social-context.js (1)
LinkQuery(4016-4054)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (11)
tests/js/tests/neighbourhood.ts (3)
45-58: Improved test reliability with state polling.Good change from expecting immediate
Syncedstate to polling for the state transition. This aligns with the Rust changes where the initial state is nowNeighbourhoodCreationInitiated.
202-212: Good replacement of fixed delays with retry loops.The change from a fixed 5-second sleep to a retry loop with a maximum timeout is a significant improvement for test reliability.
345-357: Well-implemented agent discovery with polling.The retry-based agent discovery mechanism is more robust than the previous fixed wait approach, ensuring agents properly discover each other before proceeding.
rust-executor/src/perspectives/perspective_instance.rs (8)
179-181: LGTM! Fallback sync fields properly initialized.The new fields for tracking fallback sync state are well-structured with appropriate initial values (30-second interval).
Also applies to: 203-204
431-488: Well-implemented public method with proper error handling.The refactored
ensure_public_links_are_sharedmethod correctly returns a boolean to indicate success/failure and includes helpful debug logging for the number of committed links.
245-248: Good integration with neighbourhood creation flow.Calling
ensure_public_links_are_shared()when the state isNeighbourhoodCreationInitiatedensures early links are committed before other agents join.
284-289: Correct state transition on successful sync.The transition to
Syncedstate after successful link language sync properly indicates the neighbourhood is ready.
932-934: Good trigger for resetting fallback sync interval.Resetting the fallback sync interval when new shared links are added ensures they get synced quickly.
2675-2691: Adaptive timing strategy is well-designed.The adaptive timing approach (30s → 5min on success, reset on failure) provides a good balance between responsiveness and resource efficiency.
2701-2710: Clear helper method for interval reset.The
reset_fallback_sync_intervalmethod provides a clean interface for resetting the sync interval with appropriate logging.
209-216: All background tasks properly integrated.The
fallback_sync_loopis correctly added to the background tasks spawned bystart_background_tasks().
Problem
Users reported missing links in new Neighbourhoods where:
Root Cause Analysis
The issue was in the neighbourhood creation process in
rust-executor/src/neighbourhoods.rs:Syncedwithout going through the proper state transitionensure_public_links_are_shared()function was never called because it only runs when the state isNeighbourhoodCreationInitiatedSolution
Primary Fix: Ensure Links Are Shared During Creation
File:
rust-executor/src/neighbourhoods.rsensure_public_links_are_shared()immediately after neighbourhood creationFile:
rust-executor/src/perspectives/perspective_instance.rsensure_public_links_are_shared()public so it can be called from neighbourhood creationSecondary Fix: Robust Fallback System
Implemented a smart background job that provides a safety net for link synchronization:
Adaptive Timing Strategy
Smart Execution Logic
SyncedstatePerformance Optimized
Technical Implementation
New Fields Added to PerspectiveInstance
New Background Task
fallback_sync_loop()- Runs alongside existing background tasksstart_background_tasks()functionreset_fallback_sync_interval()- Resets interval when new links addedEnhanced Functions
ensure_public_links_are_shared()now returnsboolfor success/failure trackingBenefits
Reliability
Performance
Monitoring
Testing
The fix addresses the specific issue where:
Backward Compatibility
Files Modified
rust-executor/src/neighbourhoods.rs- Primary fix for neighbourhood creationrust-executor/src/perspectives/perspective_instance.rs- Fallback system implementationCode Quality
This fix ensures that all links created before neighbourhood sharing are properly synchronized and visible to all agents, resolving the missing links issue while providing a robust fallback system for ongoing reliability.
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests