Conversation
WalkthroughAdds DB deduplication and a unique index for links, makes link inserts idempotent with INSERT OR IGNORE, enforces deterministic ORDER BY on link queries, deduplicates perspective diff additions/removals before DB mutations, and updates tests to be order-insensitive/deterministic. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Perspective as PerspectiveInstance
participant DB
rect rgba(220,240,255,0.5)
note over Perspective: Diff processing (dedup)
Caller->>Perspective: diff_from_link_language(diff)
Perspective->>Perspective: Build unique_additions & unique_removals by composite key
end
alt unique_additions not empty
loop each unique addition
Perspective->>DB: INSERT OR IGNORE INTO link(...)
DB-->>Perspective: OK (inserted or ignored)
end
else no additions
note over Perspective: Skip additions
end
alt unique_removals not empty
loop each unique removal
Perspective->>DB: DELETE FROM link WHERE ...
DB-->>Perspective: OK
end
else no removals
note over Perspective: Skip removals
end
Perspective-->>Caller: DecoratedPerspectiveDiff (unique sets)
sequenceDiagram
autonumber
participant Init as Startup/Init
participant DB
rect rgba(240,255,240,0.5)
note over Init,DB: One-time migration / initialization
Init->>DB: DELETE duplicates (keep MIN(id) per composite key)
Init->>DB: CREATE UNIQUE INDEX IF NOT EXISTS link_unique_idx(...)
DB-->>Init: OK
end
note over Init,DB: Subsequent deterministic reads
Init->>DB: SELECT ... FROM link ORDER BY timestamp, source, predicate, target, author
DB-->>Init: Deterministically ordered rows
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Bugbot found 1 bugTo see it, activate your membership in the Cursor dashboard. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
rust-executor/src/db.rs (2)
799-817: Update may now fail due to new UNIQUE constraint; implement merge semantics.With the new unique index on (perspective, source, predicate, target, author, timestamp), UPDATE can fail if the new values already exist as another row. Implement a safe “merge”:
- INSERT OR IGNORE the new row,
- DELETE the old row,
- If no insert happened (because it already existed), just delete the old row.
- self.conn.execute( - "UPDATE link SET source = ?1, predicate = ?2, target = ?3, author = ?4, timestamp = ?5, signature = ?6, key = ?7 - WHERE perspective = ?8 AND source = ?9 AND predicate = ?10 AND target = ?11 AND author = ?12 AND timestamp = ?13", - params![/* new */, /* old match */], - )?; + let tx = self.conn.unchecked_transaction()?; + // 1) Try to insert the new row (idempotent) + let inserted = tx.execute( + "INSERT OR IGNORE INTO link (perspective, source, predicate, target, author, timestamp, signature, key, status) + VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, + (SELECT status FROM link WHERE perspective = ?1 AND source = ?9 AND predicate = ?10 AND target = ?11 AND author = ?12 AND timestamp = ?13 LIMIT 1) + )", + params![perspective_uuid, new_link.data.source, new_link.data.predicate.as_ref().unwrap_or(&"".to_string()), new_link.data.target, new_link.author, new_link.timestamp, new_link.proof.signature, new_link.proof.key, + old_link.data.source, old_link.data.predicate.as_ref().unwrap_or(&"".to_string()), old_link.data.target, old_link.author, old_link.timestamp], + )?; + // 2) Delete the old row + tx.execute( + "DELETE FROM link WHERE perspective = ?1 AND source = ?2 AND predicate = ?3 AND target = ?4 AND author = ?5 AND timestamp = ?6", + params![perspective_uuid, old_link.data.source, old_link.data.predicate.as_ref().unwrap_or(&"".to_string()), old_link.data.target, old_link.author, old_link.timestamp], + )?; + tx.commit()?; Ok(())This preserves idempotency and respects the uniqueness constraint.
1656-1669: Import counter inflates on ignored duplicates; use changed-row count.INSERT OR IGNORE returns Ok even when it ignores a duplicate. Right now, imported is incremented unconditionally. Check the returned usize; only increment when > 0. Optionally track duplicates as omitted.
- match self.conn.execute( + match self.conn.execute( "INSERT OR IGNORE INTO link (perspective, source, predicate, target, author, timestamp, signature, key, status) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)", params![/* ... */], ) { - Ok(_) => result.links.imported += 1, + Ok(changed) => { + if changed > 0 { + result.links.imported += 1 + } else { + result.links.omitted += 1 + } + }
🧹 Nitpick comments (7)
rust-executor/src/db.rs (4)
90-103: Dedup + unique index: wrap in a transaction and run once.Current code runs a full-table DELETE + index creation at every startup, which can be expensive on large datasets and risks long-held write locks. Wrap these in a transaction and guard with a one-time migration flag (e.g., PRAGMA user_version) so it only runs when needed.
Example:
- // Ensure we don't have duplicate link expressions and enforce uniqueness going forward - conn.execute( + // Ensure we don't have duplicate link expressions and enforce uniqueness going forward + let tx = conn.unchecked_transaction()?; + tx.execute( "DELETE FROM link WHERE id NOT IN ( SELECT MIN(id) FROM link GROUP BY perspective, source, predicate, target, author, timestamp )", [], )?; - conn.execute( + tx.execute( "CREATE UNIQUE INDEX IF NOT EXISTS link_unique_idx ON link (perspective, source, predicate, target, author, timestamp)", [], )?; + tx.commit()?;Optionally persist a migration version to skip the DELETE after the index exists.
750-763: Idempotent insert is correct; consider exposing insert outcome.Using INSERT OR IGNORE prevents uniqueness violations—good. If the caller needs to know whether a row was actually inserted vs ignored, capture the affected row count from execute() and return it or log it.
- self.conn.execute( + let changed = self.conn.execute( "INSERT OR IGNORE INTO link (perspective, source, predicate, target, author, timestamp, signature, key, status) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)", params![/* ... */], )?; - Ok(()) + // Optionally: Ok(bool::from(changed > 0)) or log::debug!("link insert changed={}", changed); + Ok(())
775-789: Batch inserts: wrap in a transaction + prepared statement for throughput.Looping execute() per row incurs extra parse/commit overhead. Use a transaction and a prepared statement to significantly speed up bulk inserts.
- for link in links.iter() { - self.conn.execute( + let tx = self.conn.unchecked_transaction()?; + let mut stmt = tx.prepare( "INSERT OR IGNORE INTO link (perspective, source, predicate, target, author, timestamp, signature, key, status) - VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)", - params![/* ... */], - )?; - } + VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)" + )?; + for link in links.iter() { + stmt.execute(params![/* ... */])?; + } + tx.commit()?; Ok(())
889-923: Normalize predicate consistently on reads ("" -> None).get_link maps empty-string predicates back to None, but get_all_links/get_links_by_source/get_links_by_target do not, causing inconsistent API behavior and equality surprises. Apply the same normalization.
- predicate: row.get(2)?, + predicate: row.get(2).map(|p: Option<String>| match p.as_deref() { + Some("") => None, + _ => p, + })?,Apply in all three query mappers.
Also applies to: 925-961, 963-997
rust-executor/src/perspectives/perspective_instance.rs (3)
630-644: Batch removals: avoid per-row closures; add/remove in a transaction.Looping remove_link inside with_global_instance is fine but will be slow for large diffs. Consider a DB helper that removes a slice in one transaction.
Happy to sketch an Ad4mDb::remove_many_links(&self, perspective_uuid: &str, links: &[LinkExpression]) helper if you want it in this PR.
318-334: Log messages drift from constants; format with MAX_ values.*Messages say “10s” and “100” but the constants are 3s and 150. Use the constants in the log text for accuracy.
- log::info!("Committed diffs after reaching 10s maximum wait time"); + log::info!("Committed diffs after reaching {}s maximum wait time", MAX_PENDING_SECONDS); ... - log::info!("Committed diffs after collecting 100"); + log::info!("Committed diffs after collecting {}", MAX_PENDING_DIFFS_COUNT);
423-468: Nit: variable naming is misleading.local_links currently holds Shared links after retain(). Consider renaming to shared_links for readability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
rust-executor/src/db.rs(4 hunks)rust-executor/src/perspectives/perspective_instance.rs(3 hunks)turbo.json(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-05T11:39:48.641Z
Learnt from: lucksus
PR: coasys/ad4m#605
File: rust-executor/src/js_core/options.rs:59-68
Timestamp: 2025-06-05T11:39:48.641Z
Learning: In the rust-executor project, CUSTOM_DENO_SNAPSHOT.bin is a build-time generated artifact created by the generate_snapshot binary (with --features generate_snapshot). The file is not checked into source control but is created during the build process via pnpm build-deno-snapshot script and then included in subsequent builds via include_bytes! macro.
Applied to files:
turbo.json
🧬 Code graph analysis (1)
rust-executor/src/perspectives/perspective_instance.rs (3)
rust-executor/src/db.rs (2)
new(59-236)with_global_instance(1357-1366)rust-executor/src/graphql/graphql_types.rs (11)
new(481-495)new(1036-1044)new(1054-1068)from(88-90)from(94-96)from(330-338)from(379-387)from(423-435)from(613-618)from(622-627)from(668-686)rust-executor/src/types.rs (8)
from(49-63)from(75-81)from(149-156)from(160-168)from(199-210)from(214-225)from(234-242)from(360-365)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (5)
turbo.json (2)
53-57: Rename looks correct; include snapshot artifact and disable cache (good).The target key rename to "@coasys/rust-ad4m-executor#build" matches how other pipeline deps already reference it, and including CUSTOM_DENO_SNAPSHOT.bin in outputs with cache disabled aligns with the known build-time snapshot generation flow. No issues spotted.
59-62: No lingering unscoped pipeline keys foundAll instances of the renamed build tasks have been confirmed as scoped under
@coasys/and there are no leftover references to the old keys.
- turbo.json only contains the scoped targets (
@coasys/ad4m-cli#buildand@coasys/rust-ad4m-executor#build) at lines 20, 21, 53, and 59–62.- Repository-wide search for unscoped usages of
ad4m-cli#buildandrust-ad4m-executor#build(excluding node_modules and dist) returned no hits.No further changes required here.
rust-executor/src/perspectives/perspective_instance.rs (3)
646-655: Decorated diff uses Shared for removals — confirm assumption.You set LinkStatus::Shared for all removals coming from the link language. This is consistent if the language only emits shared changes. If local removals could appear here, this will mislabel them. Please confirm.
2838-2853: Tests: deterministic sorting comparator is a solid improvement.Sorting by timestamp and stable tie-breakers removes flakiness from non-ordered DB reads. Looks good.
2878-2895: Tests: source-filtered deterministic comparisons LGTM.The explicit comparator for filtered queries keeps assertions stable.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
rust-executor/src/perspectives/perspective_instance.rs (1)
597-616: Delimiter-safe dedup key: great fix, aligns with prior feedbackSwitching to a structured key via serde_json eliminates delimiter-collision bugs from "|" joins. Good call.
🧹 Nitpick comments (5)
rust-executor/src/perspectives/perspective_instance.rs (4)
599-613: Minor perf nit: pre-size the HashSet and simplify error pathAvoid reallocations and an unreachable fallback:
Apply this diff:
- let mut seen_add: std::collections::HashSet<String> = std::collections::HashSet::new(); + let mut seen_add: std::collections::HashSet<String> = + std::collections::HashSet::with_capacity(diff.additions.len()); ... - let key = serde_json::to_string(&key_tuple).unwrap_or_else(|_| { - // Fallback to a simple hash if serialization fails - format!("{:?}", key_tuple) - }); + // Serialization cannot fail for these primitives; if it does, surface loudly. + let key = serde_json::to_string(&key_tuple) + .expect("dedup key serialization");
618-632: Minor perf nit: pre-size the removals HashSetMirror the additions optimization.
Apply this diff:
- let mut seen_rem: std::collections::HashSet<String> = std::collections::HashSet::new(); + let mut seen_rem: std::collections::HashSet<String> = + std::collections::HashSet::with_capacity(diff.removals.len());
2845-2860: Test ordering comparator duplicated; factor to helper for reuseComparator logic repeats and parses timestamps each compare. Consider a small helper to DRY and keep tests readable.
2885-2901: Same: reuse comparator helper to reduce duplicationExtract shared cmp or pre-parse timestamps to avoid repeated parsing in sort comparator.
tests/js/tests/prolog-and-literals.test.ts (1)
2184-2184: Confirm intended deterministic ordering for collection valuesExpectation now assumes a sorted ingredient order. If backend guarantees deterministic sorting, great—otherwise consider sorting in the assertion to avoid coupling to internal order.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
rust-executor/src/perspectives/perspective_instance.rs(3 hunks)tests/js/tests/prolog-and-literals.test.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
rust-executor/src/perspectives/perspective_instance.rs (1)
rust-executor/src/db.rs (2)
new(59-236)with_global_instance(1357-1366)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (3)
rust-executor/src/perspectives/perspective_instance.rs (3)
618-635: Same dedup improvement for removals: looks goodMirrors additions path and fixes the same collision risk.
637-642: DB write path uses unique additions: OKWriting only unique additions with Shared status fits the link-language-originated diff semantics.
654-662: Decorated diff reflects deduped sets: goodUsing the unique sets for pubsub/prolog updates prevents duplicate events.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
rust-executor/src/db.rs (3)
894-923: Normalize empty-string predicate to None for consistency with get_link().get_link() maps "" -> None, but get_all_links() currently returns Some(""), causing inconsistent API behavior. Align the mapping here.
let link_expression = LinkExpression { data: Link { source: row.get(1)?, - predicate: row.get(2)?, + predicate: row + .get(2) + .map(|p: Option<String>| match p.as_deref() { + Some("") => None, + _ => p, + })?, target: row.get(3)?, },
931-960: Apply the same predicate normalization in get_links_by_source().let link_expression = LinkExpression { data: Link { source: row.get(1)?, - predicate: row.get(2)?, + predicate: row + .get(2) + .map(|p: Option<String>| match p.as_deref() { + Some("") => None, + _ => p, + })?, target: row.get(3)?, },
968-997: Apply the same predicate normalization in get_links_by_target().let link_expression = LinkExpression { data: Link { source: row.get(1)?, - predicate: row.get(2)?, + predicate: row + .get(2) + .map(|p: Option<String>| match p.as_deref() { + Some("") => None, + _ => p, + })?, target: row.get(3)?, },
♻️ Duplicate comments (1)
rust-executor/src/perspectives/perspective_instance.rs (1)
644-651: Don’t panic on remove_link failures in runtime path (repeat).Use log-and-continue; panics here can crash the executor on benign races (already removed).
- Ad4mDb::with_global_instance(|db| { - for link in &unique_removals { - db.remove_link(&handle.uuid, link) - .expect("Failed to remove link"); - } - }); + Ad4mDb::with_global_instance(|db| { + for link in &unique_removals { + if let Err(e) = db.remove_link(&handle.uuid, link) { + log::warn!("diff_from_link_language: failed to remove link {:?}: {}", link, e); + } + } + });
🧹 Nitpick comments (4)
rust-executor/src/db.rs (3)
90-103: Run dedup + index creation inside a single transaction; current approach can be slow and non-atomic on large tables.Wrapping both statements in one transaction reduces I/O and ensures atomicity of the cleanup + constraint install.
- // Ensure we don't have duplicate link expressions and enforce uniqueness going forward - conn.execute( - "DELETE FROM link - WHERE id NOT IN ( - SELECT MIN(id) FROM link - GROUP BY perspective, source, predicate, target, author, timestamp - )", - [], - )?; - conn.execute( - "CREATE UNIQUE INDEX IF NOT EXISTS link_unique_idx - ON link (perspective, source, predicate, target, author, timestamp)", - [], - )?; + // Ensure we don't have duplicate link expressions and enforce uniqueness going forward + { + let tx = conn.transaction()?; + tx.execute( + "DELETE FROM link + WHERE id NOT IN ( + SELECT MIN(id) FROM link + GROUP BY perspective, source, predicate, target, author, timestamp + )", + [], + )?; + tx.execute( + "CREATE UNIQUE INDEX IF NOT EXISTS link_unique_idx + ON link (perspective, source, predicate, target, author, timestamp)", + [], + )?; + tx.commit()?; + }
750-763: Prefer targeted UPSERT over INSERT OR IGNORE to avoid masking other constraint errors.Use SQLite UPSERT to ignore only uniqueness collisions on the intended constraint.
- "INSERT OR IGNORE INTO link (perspective, source, predicate, target, author, timestamp, signature, key, status) - VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)", + "INSERT INTO link (perspective, source, predicate, target, author, timestamp, signature, key, status) + VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9) + ON CONFLICT(perspective, source, predicate, target, author, timestamp) DO NOTHING",Apply the same change to add_many_links() and import_from_json() link inserts.
Also applies to: 775-788, 1656-1669
894-895: Add supporting indexes to satisfy ORDER BY without extra sorting.Given WHERE perspective = ? (and possibly source/target), SQLite can fully use these composite indexes to return rows in order:
- Index for get_all_links: (perspective, timestamp, source, predicate, target, author)
- Index for get_links_by_source: (perspective, source, timestamp, predicate, target, author)
- Index for get_links_by_target: (perspective, target, timestamp, source, predicate, author)
Example (add during init, after unique index):
conn.execute("CREATE INDEX IF NOT EXISTS link_order_all_idx ON link (perspective, timestamp, source, predicate, target, author)", [])?; conn.execute("CREATE INDEX IF NOT EXISTS link_by_source_idx ON link (perspective, source, timestamp, predicate, target, author)", [])?; conn.execute("CREATE INDEX IF NOT EXISTS link_by_target_idx ON link (perspective, target, timestamp, source, predicate, author)", [])?;Also applies to: 931-932, 968-969
rust-executor/src/perspectives/perspective_instance.rs (1)
599-613: Minor: pre-size HashSets to reduce rehashing on large diffs.- let mut seen_add: std::collections::HashSet<String> = std::collections::HashSet::new(); + let mut seen_add: std::collections::HashSet<String> = + std::collections::HashSet::with_capacity(diff.additions.len()); ... - let mut seen_rem: std::collections::HashSet<String> = std::collections::HashSet::new(); + let mut seen_rem: std::collections::HashSet<String> = + std::collections::HashSet::with_capacity(diff.removals.len());Also applies to: 621-635
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
rust-executor/src/db.rs(7 hunks)rust-executor/src/perspectives/perspective_instance.rs(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
rust-executor/src/perspectives/perspective_instance.rs (2)
rust-executor/src/db.rs (2)
new(59-236)with_global_instance(1357-1366)rust-executor/src/types.rs (8)
from(49-63)from(75-81)from(149-156)from(160-168)from(199-210)from(214-225)from(234-242)from(360-365)
🔇 Additional comments (4)
rust-executor/src/perspectives/perspective_instance.rs (4)
597-616: Solid dedup: structured keys fix delimiter-collision risk.The serde_json-based tuple key addresses the fragility noted earlier.
654-662: Confirm status semantics for removals.All removals are marked Shared. If link language can emit Local-only removals, this will mislabel events.
Would you like me to scan the codebase for sources of Local removals and adjust accordingly?
2842-2857: Tests: deterministic ordering verification looks good.
2882-2899: Tests: source-filtered sort comparator matches DB ORDER BY.
Summary by CodeRabbit
Bug Fixes
Refactor
Tests