SurrealDB Query Engine Integration for blazing fast Ad4mModel operations#632
SurrealDB Query Engine Integration for blazing fast Ad4mModel operations#632
Conversation
…it as if it were public.
…checks author signature)
WalkthroughAdds SurrealDB as a parallel query backend across TS and Rust: SurrealQL generation and mapping in Ad4mModel/ModelQueryBuilder, Perspective Surreal APIs and subscriptions, an in-process SurrealDB service with per‑perspective sync/subscribe/keepalive/dispose, GraphQL resolvers, and extensive tests and docs. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ModelQueryBuilder
participant PerspectiveProxy
participant PerspectiveInstance
participant SurrealDBService
participant PrologBackend
Client->>ModelQueryBuilder: findAll / subscribe (useSurrealDB=true)
ModelQueryBuilder->>PerspectiveProxy: querySurrealDB / subscribeSurrealDB(query)
PerspectiveProxy->>PerspectiveInstance: surreal_query / subscribe_and_query_surreal
PerspectiveInstance->>SurrealDBService: query_links(perspective_uuid, query)
SurrealDBService->>SurrealDBService: execute SurrealQL
SurrealDBService-->>PerspectiveInstance: JSON results
PerspectiveInstance-->>PerspectiveProxy: parsed results / subscription id + init
PerspectiveProxy-->>Client: results / subscription proxy (isSurrealDB=true)
alt useSurrealDB=false
Client->>ModelQueryBuilder: findAll / subscribe (useSurrealDB=false)
ModelQueryBuilder->>PerspectiveProxy: query / subscribeQuery(query)
PerspectiveProxy->>PerspectiveInstance: query (Prolog)
PerspectiveInstance->>PrologBackend: execute prolog query
PrologBackend-->>PerspectiveInstance: results
PerspectiveInstance-->>PerspectiveProxy: results
PerspectiveProxy-->>Client: results
end
sequenceDiagram
participant Client
participant PerspectiveProxy
participant PerspectiveInstance
participant SurrealDBService
Client->>PerspectiveProxy: subscribeSurrealDB(query)
PerspectiveProxy->>PerspectiveInstance: subscribe_and_query_surreal(query)
PerspectiveInstance->>SurrealDBService: start subscription (poll/keepalive)
SurrealDBService-->>PerspectiveInstance: initial result
PerspectiveInstance-->>PerspectiveProxy: subscription ready (init)
PerspectiveProxy-->>Client: deliver initial result
rect rgb(230,245,255)
Client->>PerspectiveProxy: keepalive()
PerspectiveProxy->>PerspectiveInstance: keepalive_surreal_query(subscriptionId)
PerspectiveInstance->>SurrealDBService: fetch latest
SurrealDBService-->>PerspectiveInstance: updated results
PerspectiveInstance-->>PerspectiveProxy: emit update
PerspectiveProxy-->>Client: emit update
end
Client->>PerspectiveProxy: dispose()
PerspectiveProxy->>PerspectiveInstance: dispose_surreal_query_subscription(subscriptionId)
PerspectiveInstance->>SurrealDBService: cleanup subscription
SurrealDBService-->>PerspectiveInstance: done
PerspectiveInstance-->>PerspectiveProxy: disposed
PerspectiveProxy-->>Client: subscription closed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/js/tests/prolog-and-literals.test.ts (1)
1171-1179: AvoidparseInton timestamp strings; useDatefor orderingUsing
parseInton a non-numeric timestamp (e.g. ISO string) will produce garbage values ("2025-11-25T..."→2025), breaking both sorting and the boundary you derive for subsequent</<=/>tests. You already handle string timestamps withnew Date(...).getTime()in the later count test; this block should match that pattern.- // Sort recipes by timestamp to ensure consistent ordering - const sortedRecipes = [...allRecipes].sort((a, b) => { - const aTime = typeof a.timestamp === 'number' ? a.timestamp : parseInt(a.timestamp); - const bTime = typeof b.timestamp === 'number' ? b.timestamp : parseInt(b.timestamp); - return aTime - bTime; - }); - const recipe2timestamp = typeof sortedRecipes[1].timestamp === 'number' - ? sortedRecipes[1].timestamp - : parseInt(sortedRecipes[1].timestamp); // Second recipe by timestamp + // Sort recipes by timestamp to ensure consistent ordering + const sortedRecipes = [...allRecipes].sort((a, b) => { + const aTime = typeof a.timestamp === 'number' + ? a.timestamp + : new Date(a.timestamp as any).getTime(); + const bTime = typeof b.timestamp === 'number' + ? b.timestamp + : new Date(b.timestamp as any).getTime(); + return aTime - bTime; + }); + const recipe2timestamp = typeof sortedRecipes[1].timestamp === 'number' + ? sortedRecipes[1].timestamp + : new Date(sortedRecipes[1].timestamp as any).getTime(); // Second recipe by timestampOptionally, you could also assert
Number.isFinite(recipe2timestamp)to fail fast if a non-date string sneaks in.
🧹 Nitpick comments (5)
rust-executor/src/surreal_service/mod.rs (2)
50-63: Consider defensive handling for Thing ID formatting to prevent invalid record ID strings.If the Thing structure from SurrealDB is malformed or missing expected fields, lines 54–61 could produce invalid record ID strings like
":"or"table:". While this is unlikely with well-formed SurrealDB responses, adding a validation check would prevent potential downstream errors:"Thing" => { if let Some(Value::Object(thing_obj)) = obj.remove(&key) { // Format as "table:id" let tb = thing_obj.get("tb").and_then(|v| v.as_str()).unwrap_or(""); let id = thing_obj .get("id") .and_then(|v| { unwrap_surreal_json(v.clone()).as_str().map(String::from) }) .unwrap_or_default(); + // Validate we have both parts before formatting + if !tb.is_empty() && !id.is_empty() { return Value::String(format!("{}:{}", tb, id)); + } + // Fallback to the original Thing object if malformed + return Value::Object([(key, Value::Object(thing_obj))].into_iter().collect()); } }
669-672: Consider bulk operations for better performance at scale.The current implementation calls
add_linksequentially for each link, which meansO(n * 3)individual operations (2×ensure_node+ 1×RELATEper link). While tests verify this works with 1,000 links, very large perspectives (10k+ links) may experience slow reload times.For better performance, consider batching node creation and RELATE operations:
// Pseudocode approach: // 1. Collect unique URIs from all links // 2. Bulk ensure all nodes in one transaction // 3. Batch RELATE operations (SurrealDB supports this)This optimization can be deferred since the current implementation is functional and reload/sync operations are not on the hot path.
tests/js/tests/prolog-and-literals.test.ts (3)
1615-1709: Strong operator and timestamp coverage forcount/ query builderThis new test exercises
countvsfindAllparity forgt,lt,lte,gte, andbetween, both via the static API and the query builder, and then extends that to timestamp-based filtering. The pattern of checkingcount(...) === findAll(...).lengthis an excellent guardrail for backend consistency.Minor optional robustness: after computing
recipe3Timestampvianew Date(...).getTime()for string timestamps, you could assert it’s finite to catch unexpected formats early:expect(Number.isFinite(recipe3Timestamp)).to.be.true;But the current code is reasonable if you control timestamp formats.
2205-2315: SurrealDB vs Prolog subscription parity & perf tests look solidThe new
SubscriptionTestModelblock:
- Uses its own per-test perspective lifecycle and SDNA registration, avoiding interference with the rest of the suite.
- Compares SurrealDB (default) vs explicit Prolog subscriptions off the same
where({ status: "active" })builder and checks final result parity by sortedname/status, which is exactly what you want here.- Adds a perf-oriented test with a 5s timeout loop, ensuring at least one subscription update arrives and logging latencies without baking in brittle perf thresholds.
If you run into CI flakiness later, you might consider replacing the fixed
sleep(2000)in the parity test with a loop similar to the perf test (waiting until both callbacks havecountitems or timing out), but it’s fine as-is for now.
2521-2544: Emoji tests: per-test SDNA registration and cleanup, legacybeforecommented outSwitching from the old (now commented-out)
beforeto abeforeEachthat:
- calls
ensureSDNASubjectClass(EmojiMessage), and- clears any existing
EmojiMessageinstancesgives these emoji/special-character tests strong isolation and keeps them compatible with changing backends. Based on learnings about per-call perspective isolation, reusing the same perspective name here is safe.
You could optionally delete the commented-out
beforeblock entirely to reduce noise, but that’s purely cosmetic.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
rust-executor/src/surreal_service/mod.rs(1 hunks)tests/js/tests/prolog-and-literals.test.ts(13 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-03T15:16:29.623Z
Learnt from: lucksus
Repo: coasys/ad4m PR: 619
File: rust-executor/src/perspectives/perspective_instance.rs:639-641
Timestamp: 2025-09-03T15:16:29.623Z
Learning: In the AD4M codebase, database operations like `add_many_links` during critical synchronization processes (such as `diff_from_link_language`) should use `.expect()` rather than error logging, as failures indicate severe system issues where fail-fast behavior is preferred over continuing with potentially corrupted state.
Applied to files:
rust-executor/src/surreal_service/mod.rs
📚 Learning: 2025-09-24T14:26:44.307Z
Learnt from: lucksus
Repo: coasys/ad4m PR: 628
File: tests/js/tests/prolog-and-literals.test.ts:2726-2729
Timestamp: 2025-09-24T14:26:44.307Z
Learning: In AD4M, perspectives have names but are uniquely identified by UUIDs. Creating a new perspective with the same name in consecutive calls will create a fresh perspective each time, so there's no risk of state leakage between tests based on perspective names.
Applied to files:
tests/js/tests/prolog-and-literals.test.ts
🔇 Additional comments (15)
rust-executor/src/surreal_service/mod.rs (7)
122-193: LGTM!The read-only query validation is comprehensive and correctly enforces word boundaries to prevent false positives. The case-insensitive matching and warning for unexpected query patterns provide good defense-in-depth for the query interface.
195-244: LGTM!The error handling in
ensure_nodecorrectly distinguishes between expected duplicate-record errors (which are safely ignored) and unexpected database failures (which are logged and propagated). This addresses the integrity concerns from previous reviews.
246-333: LGTM!The initialization logic properly sets up the graph schema with appropriate indexes for query optimization. The custom
fn::parse_literal()function provides domain-specific literal URI parsing, and the use ofIF NOT EXISTSensures idempotent schema definitions.
335-396: LGTM!Both
add_linkandremove_linkcorrectly use graph operations (RELATEandDELETEwithtype::thing()matching) and properly propagate errors fromensure_node. Parameter binding is used throughout to prevent injection attacks.
398-637: LGTM!The
query_linksimplementation correctly addresses previous review concerns about perspective isolation:
- Original WHERE conditions are now wrapped in parentheses (line 492) to preserve operator precedence with OR clauses.
- Aliased queries now receive perspective filtering (lines 502-510) instead of bypassing it.
The 60-second timeout with periodic logging provides good observability for long-running queries without blocking indefinitely.
678-693: LGTM!The global service lifecycle uses a standard lazy_static singleton pattern with
RwLockfor thread-safe access. The.expect()inget_surreal_serviceprovides fail-fast behavior if called before initialization, which aligns with the codebase's approach to critical system components (as noted in learnings).
695-2073: LGTM!The test suite is exceptionally comprehensive, covering:
- All CRUD operations and perspective isolation
- Graph traversal queries after reload (critical for Ad4mModel integration)
- Performance at scale (1,000 links)
- Concurrent operations and thread safety
- Security validation (read-only enforcement)
This level of test coverage provides strong confidence in the implementation's correctness and robustness.
tests/js/tests/prolog-and-literals.test.ts (8)
617-625: Per-test perspective reset for Active Record tests improves isolationRecreating/removing the
perspectiveinbeforeEachmakes the Active Record tests self-contained and prevents cross-test leakage, while still using the same name safely thanks to UUID-based identity. Based on learnings, this pattern is correct and desirable.
664-671:find()test now seeds its own fixtureCreating and saving
recipe1inside this test before callingfindAll()removes any reliance on prior tests and aligns well with the new per-test perspective setup.
717-740:delete()test correctly constructs its own datasetExplicitly creating three recipes and asserting the length transitions 3 → 2 after
delete()makes this test robust under the new isolation model and independent of earlier Active Record tests.
899-904: Order-agnostic checks forcommentscollectionsSwitching to
.include(...)instead of relying on array ordering makes thefindAll()collection assertions tolerant to backend-specific ordering differences (Prolog vs SurrealDB) while still verifying the right elements are present.
1330-1363: Limit/offset pagination test is clear and sufficientThe test now cleanly exercises
limit,offset, and their combination against a simple, ordered dataset of"Recipe 1" ... "Recipe 6". Assertions line up with expected paging semantics; nothing to change here.
1414-1466: Good coverage for multi-propertywhereAND semanticsThis test nicely validates that:
- multiple scalar constraints in a single
whereare AND-ed,- mixed operator/equality constraints (
number: { gt: 5 }+booleanTest: true) behave as expected,- and impossible combinations return zero results.
This is valuable coverage for the query builder’s composed predicates.
1855-1858: Adjusted expectations for SurrealDB subscription update countDocumenting and commenting out the strict
updateCountassertion acknowledges the extra raw-result updates you get when filtering happens client-side under SurrealDB, while still asserting on the finalnotifications.length. This keeps the test meaningful without making it backend-fragile.
2143-2175: Using.have.membersfor batch ingredients avoids order dependenceFor both the post-commit and pre-update checks you now assert ingredients via
.have.members([...]), which is the right choice for collections whose ordering isn’t semantically important and may differ between Prolog/SurrealDB or across implementations.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/pages/index.mdx (1)
199-228: Clarify performance claims and add proper caveats for SurrealDB recommendations.Lines 205 and 223–227 claim "10-100x faster performance" for SurrealDB without context or benchmarking data. This is a strong claim that should either cite performance benchmarks or be qualified (e.g., "in typical scenarios" or "compared to unindexed Prolog queries"). Additionally, line 223's performance tip warns against subqueries but doesn't explain why or suggest alternatives.
Consider adding:
- A note referencing performance benchmarks or test results (if available)
- Guidance on when Prolog might still be preferable (complex logic, smaller datasets)
- Alternative patterns instead of subqueries, or links to the SurrealDB queries guide for more details
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/pages/index.mdx(6 hunks)
🔇 Additional comments (4)
docs/pages/index.mdx (4)
250-254: Confirm past review fix:infer()method is now correctly documented (resolves earlierqueryProlog()issue).The Prolog example now correctly uses
perspective.infer()instead of the privateperspective.queryProlog()method, aligning with the canonical API. This resolves the issue flagged in the previous review.
262-307: The documented Ad4mModel decorator examples are correct and match the actual implementation.Verification confirms both syntax patterns are valid:
- @readonly getter (line 283): The
getterparameter is supported byPropertyOptionsinterface and correctly documented as a custom Prolog expression for retrieving property values.- @Collection where clause (line 292): The
whereparameter withconditionproperty is supported byWhereOptionsinterface and correctly documented for filtering collection values.No changes to the documentation are needed.
213-221: No issues found -querySurrealDB()API method exists and query syntax is accurate.Verification confirms:
- The
querySurrealDB()method is properly implemented inPerspectiveProxy.ts(line 554) andPerspectiveClient.ts(line 167)- Query syntax examples in docs/pages/index.mdx match the SurrealQL patterns documented in the source code
- Field names (
in.uri,out.uri,target,predicate) are consistently referenced across both source documentation and the mdx file- Examples extend the source code patterns but remain consistent (docs show indexed graph queries while source examples show basic filtering)
The code snippet at lines 213-221 in docs/pages/index.mdx is accurate and reflects the actual API surface.
258-318: Perfect! I have verified all the claims in the review comment against the actual Ad4mModel implementation. Here are my findings:Verification Results:
- SurrealDB is the default ✓ - Confirmed at line 1630:
useSurrealDB: boolean = trueuseSurrealDB: falsetoggle exists ✓ - Confirmed; when false, routes to Prolog viaqueryToProlog()(line 1638)- Query method signatures match documentation ✓ - All query methods (findAll, findAllAndCount, paginate, count) follow the documented pattern with the useSurrealDB parameter defaulting to true
The implementation confirms:
- Line 1632-1639: Conditional logic correctly routes to SurrealDB or Prolog based on the flag
- All documented examples with
Recipe.findAll(perspective, {}, false)for backward compatibility are supported- Performance claims and backward compatibility documentation are accurate
All claims in the review comment are accurate and verified against the codebase. No inaccuracies, discrepancies, or implementation issues were found.
|
Could the Prolog engine be improved to handle the required cases more efficiently? If possible, would you please consider posting a benchmark of cases where you would benefit from faster performance of Scryer? For example, @Yordando recently did this, leading to a 10-fold speed improvement in the same month: |
🚧 SurrealDB Query Engine Integration
This PR introduces a foundational SurrealDB-based query engine (in memory cache) for AD4M's data layer - as a faster alternative to Prolog for some simple but often used queries. This work enables efficient, structured queries over perspective links and lays the groundwork for improved data retrieval performance.
🎯 Key Changes
SurrealDB Service Integration (rust-executor/src/surreal_service/)
• Added new SurrealDBService with in-memory database for link storage
• Implemented schema-full link table with indexed fields:
◦ perspective, source, predicate, target, author, timestamp
• Created custom SurrealQL function fn::parse_literal() for handling AD4M literal URIs
• Integrated service into PerspectiveInstance for real-time link synchronization
Ad4mModel Query Builder (core/src/model/)
• Implemented SurrealQL query generation from model metadata
• Added ModelMetadata extraction system with support for:
◦ Property metadata (predicates, required fields, getters/setters)
◦ Collection metadata with WHERE clause support
◦ JSON Schema fallback for decorator-less models
• Enhanced findAll() to utilize SurrealDB queries (WIP)
• Added contains operator to WhereOps for flexible filtering
• Implemented sophisticated WHERE clause injection with perspective context
Testing & Infrastructure
• Added extensive test coverage for SurrealDB query generation
• Created Ad4mModel.test.ts with 1,163 lines of tests
• Updated agent service tests to use real keys for SDNA validation
• Improved test reliability by removing unnecessary sleeps
Summary by CodeRabbit
New Features
Tests
Docs
Chores
✏️ Tip: You can customize this high-level summary in your review settings.