Skip to content

Deduplicate links#619

Merged
lucksus merged 11 commits intodevfrom
fix/dedup-links
Sep 3, 2025
Merged

Deduplicate links#619
lucksus merged 11 commits intodevfrom
fix/dedup-links

Conversation

@lucksus
Copy link
Member

@lucksus lucksus commented Aug 22, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Deduplicates existing links and enforces uniqueness to prevent future duplicates.
    • Makes link insertions and imports idempotent to avoid duplicate errors.
    • Ensures deterministic ordering when listing or fetching links for consistent results.
  • Refactor

    • Deduplicates additions and removals in perspective diffs to avoid redundant updates.
  • Tests

    • Updated expectations and sorting to reflect deterministic link ordering.
    • Made a recipe test order-insensitive by checking contents rather than exact array order.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

Walkthrough

Adds DB deduplication and a unique index for links, makes link inserts idempotent with INSERT OR IGNORE, enforces deterministic ORDER BY on link queries, deduplicates perspective diff additions/removals before DB mutations, and updates tests to be order-insensitive/deterministic.

Changes

Cohort / File(s) Summary of Changes
Link DB deduplication and idempotent operations
rust-executor/src/db.rs
- Delete duplicate link rows keeping MIN(id) per (perspective, source, predicate, target, author, timestamp).
- Create unique index link_unique_idx IF NOT EXISTS on those columns.
- Replace INSERT with INSERT OR IGNORE for link insertions (affects add_link, add_many_links, import_from_json).
- Add ORDER BY clauses to link retrieval queries (get_link, get_all_links, get_links_by_source, get_links_by_target) for deterministic ordering.
Perspective diff de-duplication and deterministic tests
rust-executor/src/perspectives/perspective_instance.rs
- Deduplicate diff.additions and diff.removals by composite key (author, timestamp, source, predicate, target) into unique_additions and unique_removals (uses structured keys / serde_json fallback).
- Apply DB mutations only for unique sets; write additions with LinkStatus::Shared and perform removals from unique_removals.
- Rebuild DecoratedPerspectiveDiff using the unique collections.
- Update tests to sort link results deterministically before assertions (sort by timestamp, then source, predicate, target, author).
Test expectation update (order-insensitivity)
tests/js/tests/prolog-and-literals.test.ts
- Replace exact array equality assertion for BatchRecipe.ingredients with an order-insensitive check: assert length is 4 and that it contains "pasta", "sauce", "cheese", and "garlic" (no longer relies on a specific order).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant Perspective as PerspectiveInstance
  participant DB

  rect rgba(220,240,255,0.5)
    note over Perspective: Diff processing (dedup)
    Caller->>Perspective: diff_from_link_language(diff)
    Perspective->>Perspective: Build unique_additions & unique_removals by composite key
  end

  alt unique_additions not empty
    loop each unique addition
      Perspective->>DB: INSERT OR IGNORE INTO link(...)
      DB-->>Perspective: OK (inserted or ignored)
    end
  else no additions
    note over Perspective: Skip additions
  end

  alt unique_removals not empty
    loop each unique removal
      Perspective->>DB: DELETE FROM link WHERE ...
      DB-->>Perspective: OK
    end
  else no removals
    note over Perspective: Skip removals
  end

  Perspective-->>Caller: DecoratedPerspectiveDiff (unique sets)
Loading
sequenceDiagram
  autonumber
  participant Init as Startup/Init
  participant DB

  rect rgba(240,255,240,0.5)
    note over Init,DB: One-time migration / initialization
    Init->>DB: DELETE duplicates (keep MIN(id) per composite key)
    Init->>DB: CREATE UNIQUE INDEX IF NOT EXISTS link_unique_idx(...)
    DB-->>Init: OK
  end

  note over Init,DB: Subsequent deterministic reads
  Init->>DB: SELECT ... FROM link ORDER BY timestamp, source, predicate, target, author
  DB-->>Init: Deterministically ordered rows
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I tidied the links in a burrow of bytes,
Nibbled duplicates down to single delights.
With OR IGNORE whiskers twitch—no fuss!
Queries now hop in deterministic truss.
I prune my diff, then bound—no fear—
A cleaner warren of data is here. 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b818f67 and 8336adf.

📒 Files selected for processing (1)
  • tests/js/tests/prolog-and-literals.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/js/tests/prolog-and-literals.test.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/dedup-links

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@cursor
Copy link

cursor bot commented Aug 22, 2025

Bugbot found 1 bug

To see it, activate your membership in the Cursor dashboard.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
rust-executor/src/db.rs (2)

799-817: Update may now fail due to new UNIQUE constraint; implement merge semantics.

With the new unique index on (perspective, source, predicate, target, author, timestamp), UPDATE can fail if the new values already exist as another row. Implement a safe “merge”:

  1. INSERT OR IGNORE the new row,
  2. DELETE the old row,
  3. If no insert happened (because it already existed), just delete the old row.
-        self.conn.execute(
-            "UPDATE link SET source = ?1, predicate = ?2, target = ?3, author = ?4, timestamp = ?5, signature = ?6, key = ?7
-             WHERE perspective = ?8 AND source = ?9 AND predicate = ?10 AND target = ?11 AND author = ?12 AND timestamp = ?13",
-            params![/* new */, /* old match */],
-        )?;
+        let tx = self.conn.unchecked_transaction()?;
+        // 1) Try to insert the new row (idempotent)
+        let inserted = tx.execute(
+            "INSERT OR IGNORE INTO link (perspective, source, predicate, target, author, timestamp, signature, key, status)
+             VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8,
+                     (SELECT status FROM link WHERE perspective = ?1 AND source = ?9 AND predicate = ?10 AND target = ?11 AND author = ?12 AND timestamp = ?13 LIMIT 1)
+             )",
+            params![perspective_uuid, new_link.data.source, new_link.data.predicate.as_ref().unwrap_or(&"".to_string()), new_link.data.target, new_link.author, new_link.timestamp, new_link.proof.signature, new_link.proof.key,
+                    old_link.data.source, old_link.data.predicate.as_ref().unwrap_or(&"".to_string()), old_link.data.target, old_link.author, old_link.timestamp],
+        )?;
+        // 2) Delete the old row
+        tx.execute(
+            "DELETE FROM link WHERE perspective = ?1 AND source = ?2 AND predicate = ?3 AND target = ?4 AND author = ?5 AND timestamp = ?6",
+            params![perspective_uuid, old_link.data.source, old_link.data.predicate.as_ref().unwrap_or(&"".to_string()), old_link.data.target, old_link.author, old_link.timestamp],
+        )?;
+        tx.commit()?;
         Ok(())

This preserves idempotency and respects the uniqueness constraint.


1656-1669: Import counter inflates on ignored duplicates; use changed-row count.

INSERT OR IGNORE returns Ok even when it ignores a duplicate. Right now, imported is incremented unconditionally. Check the returned usize; only increment when > 0. Optionally track duplicates as omitted.

-                        match self.conn.execute(
+                        match self.conn.execute(
                             "INSERT OR IGNORE INTO link (perspective, source, predicate, target, author, timestamp, signature, key, status) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
                             params![/* ... */],
                         ) {
-                            Ok(_) => result.links.imported += 1,
+                            Ok(changed) => {
+                                if changed > 0 {
+                                    result.links.imported += 1
+                                } else {
+                                    result.links.omitted += 1
+                                }
+                            }
🧹 Nitpick comments (7)
rust-executor/src/db.rs (4)

90-103: Dedup + unique index: wrap in a transaction and run once.

Current code runs a full-table DELETE + index creation at every startup, which can be expensive on large datasets and risks long-held write locks. Wrap these in a transaction and guard with a one-time migration flag (e.g., PRAGMA user_version) so it only runs when needed.

Example:

-        // Ensure we don't have duplicate link expressions and enforce uniqueness going forward
-        conn.execute(
+        // Ensure we don't have duplicate link expressions and enforce uniqueness going forward
+        let tx = conn.unchecked_transaction()?;
+        tx.execute(
             "DELETE FROM link
              WHERE id NOT IN (
                SELECT MIN(id) FROM link
                GROUP BY perspective, source, predicate, target, author, timestamp
              )",
             [],
         )?;
-        conn.execute(
+        tx.execute(
             "CREATE UNIQUE INDEX IF NOT EXISTS link_unique_idx
              ON link (perspective, source, predicate, target, author, timestamp)",
             [],
         )?;
+        tx.commit()?;

Optionally persist a migration version to skip the DELETE after the index exists.


750-763: Idempotent insert is correct; consider exposing insert outcome.

Using INSERT OR IGNORE prevents uniqueness violations—good. If the caller needs to know whether a row was actually inserted vs ignored, capture the affected row count from execute() and return it or log it.

-        self.conn.execute(
+        let changed = self.conn.execute(
             "INSERT OR IGNORE INTO link (perspective, source, predicate, target, author, timestamp, signature, key, status)
              VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
             params![/* ... */],
         )?;
-        Ok(())
+        // Optionally: Ok(bool::from(changed > 0)) or log::debug!("link insert changed={}", changed);
+        Ok(())

775-789: Batch inserts: wrap in a transaction + prepared statement for throughput.

Looping execute() per row incurs extra parse/commit overhead. Use a transaction and a prepared statement to significantly speed up bulk inserts.

-        for link in links.iter() {
-            self.conn.execute(
+        let tx = self.conn.unchecked_transaction()?;
+        let mut stmt = tx.prepare(
                 "INSERT OR IGNORE INTO link (perspective, source, predicate, target, author, timestamp, signature, key, status)
-                 VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
-                params![/* ... */],
-            )?;
-        }
+                 VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)"
+        )?;
+        for link in links.iter() {
+            stmt.execute(params![/* ... */])?;
+        }
+        tx.commit()?;
         Ok(())

889-923: Normalize predicate consistently on reads ("" -> None).

get_link maps empty-string predicates back to None, but get_all_links/get_links_by_source/get_links_by_target do not, causing inconsistent API behavior and equality surprises. Apply the same normalization.

-                    predicate: row.get(2)?,
+                    predicate: row.get(2).map(|p: Option<String>| match p.as_deref() {
+                        Some("") => None,
+                        _ => p,
+                    })?,

Apply in all three query mappers.

Also applies to: 925-961, 963-997

rust-executor/src/perspectives/perspective_instance.rs (3)

630-644: Batch removals: avoid per-row closures; add/remove in a transaction.

Looping remove_link inside with_global_instance is fine but will be slow for large diffs. Consider a DB helper that removes a slice in one transaction.

Happy to sketch an Ad4mDb::remove_many_links(&self, perspective_uuid: &str, links: &[LinkExpression]) helper if you want it in this PR.


318-334: Log messages drift from constants; format with MAX_ values.*

Messages say “10s” and “100” but the constants are 3s and 150. Use the constants in the log text for accuracy.

-                        log::info!("Committed diffs after reaching 10s maximum wait time");
+                        log::info!("Committed diffs after reaching {}s maximum wait time", MAX_PENDING_SECONDS);
...
-                    log::info!("Committed diffs after collecting 100");
+                    log::info!("Committed diffs after collecting {}", MAX_PENDING_DIFFS_COUNT);

423-468: Nit: variable naming is misleading.

local_links currently holds Shared links after retain(). Consider renaming to shared_links for readability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1b3e287 and cffa65b.

📒 Files selected for processing (3)
  • rust-executor/src/db.rs (4 hunks)
  • rust-executor/src/perspectives/perspective_instance.rs (3 hunks)
  • turbo.json (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-05T11:39:48.641Z
Learnt from: lucksus
PR: coasys/ad4m#605
File: rust-executor/src/js_core/options.rs:59-68
Timestamp: 2025-06-05T11:39:48.641Z
Learning: In the rust-executor project, CUSTOM_DENO_SNAPSHOT.bin is a build-time generated artifact created by the generate_snapshot binary (with --features generate_snapshot). The file is not checked into source control but is created during the build process via pnpm build-deno-snapshot script and then included in subsequent builds via include_bytes! macro.

Applied to files:

  • turbo.json
🧬 Code graph analysis (1)
rust-executor/src/perspectives/perspective_instance.rs (3)
rust-executor/src/db.rs (2)
  • new (59-236)
  • with_global_instance (1357-1366)
rust-executor/src/graphql/graphql_types.rs (11)
  • new (481-495)
  • new (1036-1044)
  • new (1054-1068)
  • from (88-90)
  • from (94-96)
  • from (330-338)
  • from (379-387)
  • from (423-435)
  • from (613-618)
  • from (622-627)
  • from (668-686)
rust-executor/src/types.rs (8)
  • from (49-63)
  • from (75-81)
  • from (149-156)
  • from (160-168)
  • from (199-210)
  • from (214-225)
  • from (234-242)
  • from (360-365)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot
🔇 Additional comments (5)
turbo.json (2)

53-57: Rename looks correct; include snapshot artifact and disable cache (good).

The target key rename to "@coasys/rust-ad4m-executor#build" matches how other pipeline deps already reference it, and including CUSTOM_DENO_SNAPSHOT.bin in outputs with cache disabled aligns with the known build-time snapshot generation flow. No issues spotted.


59-62: No lingering unscoped pipeline keys found

All instances of the renamed build tasks have been confirmed as scoped under @coasys/ and there are no leftover references to the old keys.

  • turbo.json only contains the scoped targets (@coasys/ad4m-cli#build and @coasys/rust-ad4m-executor#build) at lines 20, 21, 53, and 59–62.
  • Repository-wide search for unscoped usages of ad4m-cli#build and rust-ad4m-executor#build (excluding node_modules and dist) returned no hits.

No further changes required here.

rust-executor/src/perspectives/perspective_instance.rs (3)

646-655: Decorated diff uses Shared for removals — confirm assumption.

You set LinkStatus::Shared for all removals coming from the link language. This is consistent if the language only emits shared changes. If local removals could appear here, this will mislabel them. Please confirm.


2838-2853: Tests: deterministic sorting comparator is a solid improvement.

Sorting by timestamp and stable tie-breakers removes flakiness from non-ordered DB reads. Looks good.


2878-2895: Tests: source-filtered deterministic comparisons LGTM.

The explicit comparator for filtered queries keeps assertions stable.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
rust-executor/src/perspectives/perspective_instance.rs (1)

597-616: Delimiter-safe dedup key: great fix, aligns with prior feedback

Switching to a structured key via serde_json eliminates delimiter-collision bugs from "|" joins. Good call.

🧹 Nitpick comments (5)
rust-executor/src/perspectives/perspective_instance.rs (4)

599-613: Minor perf nit: pre-size the HashSet and simplify error path

Avoid reallocations and an unreachable fallback:

Apply this diff:

-        let mut seen_add: std::collections::HashSet<String> = std::collections::HashSet::new();
+        let mut seen_add: std::collections::HashSet<String> =
+            std::collections::HashSet::with_capacity(diff.additions.len());
...
-            let key = serde_json::to_string(&key_tuple).unwrap_or_else(|_| {
-                // Fallback to a simple hash if serialization fails
-                format!("{:?}", key_tuple)
-            });
+            // Serialization cannot fail for these primitives; if it does, surface loudly.
+            let key = serde_json::to_string(&key_tuple)
+                .expect("dedup key serialization");

618-632: Minor perf nit: pre-size the removals HashSet

Mirror the additions optimization.

Apply this diff:

-        let mut seen_rem: std::collections::HashSet<String> = std::collections::HashSet::new();
+        let mut seen_rem: std::collections::HashSet<String> =
+            std::collections::HashSet::with_capacity(diff.removals.len());

2845-2860: Test ordering comparator duplicated; factor to helper for reuse

Comparator logic repeats and parses timestamps each compare. Consider a small helper to DRY and keep tests readable.


2885-2901: Same: reuse comparator helper to reduce duplication

Extract shared cmp or pre-parse timestamps to avoid repeated parsing in sort comparator.

tests/js/tests/prolog-and-literals.test.ts (1)

2184-2184: Confirm intended deterministic ordering for collection values

Expectation now assumes a sorted ingredient order. If backend guarantees deterministic sorting, great—otherwise consider sorting in the assertion to avoid coupling to internal order.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cffa65b and c8597a9.

📒 Files selected for processing (2)
  • rust-executor/src/perspectives/perspective_instance.rs (3 hunks)
  • tests/js/tests/prolog-and-literals.test.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
rust-executor/src/perspectives/perspective_instance.rs (1)
rust-executor/src/db.rs (2)
  • new (59-236)
  • with_global_instance (1357-1366)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot
🔇 Additional comments (3)
rust-executor/src/perspectives/perspective_instance.rs (3)

618-635: Same dedup improvement for removals: looks good

Mirrors additions path and fixes the same collision risk.


637-642: DB write path uses unique additions: OK

Writing only unique additions with Shared status fits the link-language-originated diff semantics.


654-662: Decorated diff reflects deduped sets: good

Using the unique sets for pubsub/prolog updates prevents duplicate events.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
rust-executor/src/db.rs (3)

894-923: Normalize empty-string predicate to None for consistency with get_link().

get_link() maps "" -> None, but get_all_links() currently returns Some(""), causing inconsistent API behavior. Align the mapping here.

             let link_expression = LinkExpression {
                 data: Link {
                     source: row.get(1)?,
-                    predicate: row.get(2)?,
+                    predicate: row
+                        .get(2)
+                        .map(|p: Option<String>| match p.as_deref() {
+                            Some("") => None,
+                            _ => p,
+                        })?,
                     target: row.get(3)?,
                 },

931-960: Apply the same predicate normalization in get_links_by_source().

             let link_expression = LinkExpression {
                 data: Link {
                     source: row.get(1)?,
-                    predicate: row.get(2)?,
+                    predicate: row
+                        .get(2)
+                        .map(|p: Option<String>| match p.as_deref() {
+                            Some("") => None,
+                            _ => p,
+                        })?,
                     target: row.get(3)?,
                 },

968-997: Apply the same predicate normalization in get_links_by_target().

             let link_expression = LinkExpression {
                 data: Link {
                     source: row.get(1)?,
-                    predicate: row.get(2)?,
+                    predicate: row
+                        .get(2)
+                        .map(|p: Option<String>| match p.as_deref() {
+                            Some("") => None,
+                            _ => p,
+                        })?,
                     target: row.get(3)?,
                 },
♻️ Duplicate comments (1)
rust-executor/src/perspectives/perspective_instance.rs (1)

644-651: Don’t panic on remove_link failures in runtime path (repeat).

Use log-and-continue; panics here can crash the executor on benign races (already removed).

-            Ad4mDb::with_global_instance(|db| {
-                for link in &unique_removals {
-                    db.remove_link(&handle.uuid, link)
-                        .expect("Failed to remove link");
-                }
-            });
+            Ad4mDb::with_global_instance(|db| {
+                for link in &unique_removals {
+                    if let Err(e) = db.remove_link(&handle.uuid, link) {
+                        log::warn!("diff_from_link_language: failed to remove link {:?}: {}", link, e);
+                    }
+                }
+            });
🧹 Nitpick comments (4)
rust-executor/src/db.rs (3)

90-103: Run dedup + index creation inside a single transaction; current approach can be slow and non-atomic on large tables.

Wrapping both statements in one transaction reduces I/O and ensures atomicity of the cleanup + constraint install.

-        // Ensure we don't have duplicate link expressions and enforce uniqueness going forward
-        conn.execute(
-            "DELETE FROM link
-             WHERE id NOT IN (
-               SELECT MIN(id) FROM link
-               GROUP BY perspective, source, predicate, target, author, timestamp
-             )",
-            [],
-        )?;
-        conn.execute(
-            "CREATE UNIQUE INDEX IF NOT EXISTS link_unique_idx
-             ON link (perspective, source, predicate, target, author, timestamp)",
-            [],
-        )?;
+        // Ensure we don't have duplicate link expressions and enforce uniqueness going forward
+        {
+            let tx = conn.transaction()?;
+            tx.execute(
+                "DELETE FROM link
+                 WHERE id NOT IN (
+                   SELECT MIN(id) FROM link
+                   GROUP BY perspective, source, predicate, target, author, timestamp
+                 )",
+                [],
+            )?;
+            tx.execute(
+                "CREATE UNIQUE INDEX IF NOT EXISTS link_unique_idx
+                 ON link (perspective, source, predicate, target, author, timestamp)",
+                [],
+            )?;
+            tx.commit()?;
+        }

750-763: Prefer targeted UPSERT over INSERT OR IGNORE to avoid masking other constraint errors.

Use SQLite UPSERT to ignore only uniqueness collisions on the intended constraint.

-            "INSERT OR IGNORE INTO link (perspective, source, predicate, target, author, timestamp, signature, key, status)
-             VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
+            "INSERT INTO link (perspective, source, predicate, target, author, timestamp, signature, key, status)
+             VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)
+             ON CONFLICT(perspective, source, predicate, target, author, timestamp) DO NOTHING",

Apply the same change to add_many_links() and import_from_json() link inserts.

Also applies to: 775-788, 1656-1669


894-895: Add supporting indexes to satisfy ORDER BY without extra sorting.

Given WHERE perspective = ? (and possibly source/target), SQLite can fully use these composite indexes to return rows in order:

  • Index for get_all_links: (perspective, timestamp, source, predicate, target, author)
  • Index for get_links_by_source: (perspective, source, timestamp, predicate, target, author)
  • Index for get_links_by_target: (perspective, target, timestamp, source, predicate, author)

Example (add during init, after unique index):

conn.execute("CREATE INDEX IF NOT EXISTS link_order_all_idx ON link (perspective, timestamp, source, predicate, target, author)", [])?;
conn.execute("CREATE INDEX IF NOT EXISTS link_by_source_idx ON link (perspective, source, timestamp, predicate, target, author)", [])?;
conn.execute("CREATE INDEX IF NOT EXISTS link_by_target_idx ON link (perspective, target, timestamp, source, predicate, author)", [])?;

Also applies to: 931-932, 968-969

rust-executor/src/perspectives/perspective_instance.rs (1)

599-613: Minor: pre-size HashSets to reduce rehashing on large diffs.

-        let mut seen_add: std::collections::HashSet<String> = std::collections::HashSet::new();
+        let mut seen_add: std::collections::HashSet<String> =
+            std::collections::HashSet::with_capacity(diff.additions.len());
...
-        let mut seen_rem: std::collections::HashSet<String> = std::collections::HashSet::new();
+        let mut seen_rem: std::collections::HashSet<String> =
+            std::collections::HashSet::with_capacity(diff.removals.len());

Also applies to: 621-635

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c8597a9 and df37a1d.

📒 Files selected for processing (2)
  • rust-executor/src/db.rs (7 hunks)
  • rust-executor/src/perspectives/perspective_instance.rs (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
rust-executor/src/perspectives/perspective_instance.rs (2)
rust-executor/src/db.rs (2)
  • new (59-236)
  • with_global_instance (1357-1366)
rust-executor/src/types.rs (8)
  • from (49-63)
  • from (75-81)
  • from (149-156)
  • from (160-168)
  • from (199-210)
  • from (214-225)
  • from (234-242)
  • from (360-365)
🔇 Additional comments (4)
rust-executor/src/perspectives/perspective_instance.rs (4)

597-616: Solid dedup: structured keys fix delimiter-collision risk.

The serde_json-based tuple key addresses the fragility noted earlier.


654-662: Confirm status semantics for removals.

All removals are marked Shared. If link language can emit Local-only removals, this will mislabel events.

Would you like me to scan the codebase for sources of Local removals and adjust accordingly?


2842-2857: Tests: deterministic ordering verification looks good.


2882-2899: Tests: source-filtered sort comparator matches DB ORDER BY.

@lucksus lucksus merged commit d05eeed into dev Sep 3, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants