Skip to content

fix: reject links with invalid or empty source/target/predicate URIs#676

Merged
lucksus merged 10 commits intodevfrom
fix/reject-invalid-links
Feb 18, 2026
Merged

fix: reject links with invalid or empty source/target/predicate URIs#676
lucksus merged 10 commits intodevfrom
fix/reject-invalid-links

Conversation

@lucksus
Copy link
Copy Markdown
Member

@lucksus lucksus commented Feb 17, 2026

Add Link::validate() that enforces RFC 3986 scheme presence
([a-zA-Z][a-zA-Z0-9+-.]*:) on source, target, and predicate before
any link is stored. Called at all four entry points: add_link,
add_links, add_link_expression, and link_mutations. Any invalid link
now returns an error immediately instead of silently corrupting the
perspective graph.

Summary by CodeRabbit

  • Bug Fixes

    • Added earlier, stricter validation of link data so invalid links are rejected before persistence.
    • Normalized string/number literals into valid URIs with scheme detection to prevent malformed links.
    • Standardized predicate namespaces to ad4m:// across link handling and queries.
  • Tests

    • Updated test data and expectations to reflect URI normalization and ad4m:// predicate namespace.

  Add Link::validate() that enforces RFC 3986 scheme presence
  ([a-zA-Z][a-zA-Z0-9+\-.]*:) on source, target, and predicate before
  any link is stored. Called at all four entry points: add_link,
  add_links, add_link_expression, and link_mutations. Any invalid link
  now returns an error immediately instead of silently corrupting the
  perspective graph.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

Adds pre-validation and URI normalization for Link data in perspective link creation and mutation flows; validation runs before signing or persistence and tests updated to use ad4m://-prefixed predicates and normalized URIs.

Changes

Cohort / File(s) Summary
Link Type & Validation
rust-executor/src/types.rs
Added Link::validate() to verify source/target (and optional predicate) are non-empty URIs with a valid scheme; returns descriptive errors.
Validation & Normalization Integration
rust-executor/src/perspectives/perspective_instance.rs
Inserted pre-validation calls in add_link, add_link_expression, add_links, and link_mutations; added URI normalization for string/number literals and scheme detection via regex before signing or persistence.
Tests & Test Data
tests/js/tests/*.ts, tests/js/tests/neighbourhood.ts, tests/js/tests/prolog-and-literals.test.ts
Updated many tests to use ad4m://-prefixed predicates and normalized URIs (e.g., ad4m://root, test://target); adjusted assertions and SurrealQL expectations accordingly.
Manifest
Cargo.toml
Minor manifest edits; no public API signature changes.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant PerspectiveInstance
    participant Validator
    participant DB

    Client->>PerspectiveInstance: submit add_link / add_links / mutations (Link / LinkExpression)
    PerspectiveInstance->>Validator: normalize(s) → validate(s)
    alt validation fails
        Validator-->>PerspectiveInstance: error
        PerspectiveInstance-->>Client: return validation error (abort)
    else validation succeeds
        Validator-->>PerspectiveInstance: validated Link(s)
        PerspectiveInstance->>DB: persist links / execute SurrealDB queries
        DB-->>PerspectiveInstance: persistence confirmation
        PerspectiveInstance-->>Client: success (signed expressions / result)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • jhweir

Poem

🐇 I check each link before it hops,
Schemes in place so nothing flops,
ad4m:// roots snug and neat,
Signed and stored — the graph complete,
A rabbit nods: the data hops to beats.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding validation to reject links with invalid or empty URIs, which is the core purpose reflected throughout the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 82.76% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/reject-invalid-links

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rust-executor/src/types.rs`:
- Around line 111-140: In validate(), the current predicate branch allows an
empty string to bypass URI validation; update the predicate handling in the
validate() method so that when self.predicate is Some(predicate) you first
return an error if predicate.is_empty() (matching the behavior used for
source/target) and otherwise call the existing check closure; reference the
validate function, the local check closure, and the predicate variable to locate
where to add the explicit empty-string rejection.

Comment on lines +111 to +140
/// Validates that source and target are non-empty URIs (RFC 3986 scheme present),
/// and that predicate, if present, is also a valid URI.
///
/// A valid URI must begin with a scheme: `[a-zA-Z][a-zA-Z0-9+\-.]*:`
/// Examples: `did:key:alice`, `expression://Qm...`, `neighbourhood://Qm...`
pub fn validate(&self) -> Result<(), AnyError> {
use std::sync::OnceLock;
static URI_SCHEME_RE: OnceLock<Regex> = OnceLock::new();
let re = URI_SCHEME_RE.get_or_init(|| Regex::new(r"^[a-zA-Z][a-zA-Z0-9+\-.]*:").unwrap());

let check = |field: &str, value: &str| -> Result<(), AnyError> {
if value.is_empty() {
return Err(anyhow!("Link {} must not be empty", field));
}
if !re.is_match(value) {
return Err(anyhow!(
"Link {} is not a valid URI (must start with a scheme like 'did:', 'expression://', etc.): '{}'",
field, value
));
}
Ok(())
};

check("source", &self.source)?;
check("target", &self.target)?;

if let Some(predicate) = &self.predicate {
if !predicate.is_empty() {
check("predicate", predicate)?;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject empty predicate explicitly if it should be invalid.

predicate == "" currently bypasses validation, so empty predicates slip through despite the “reject empty URIs” objective. If empty predicates should be rejected, add an explicit error before calling check().

🛠️ Suggested fix
-        if let Some(predicate) = &self.predicate {
-            if !predicate.is_empty() {
-                check("predicate", predicate)?;
-            }
-        }
+        if let Some(predicate) = &self.predicate {
+            if predicate.is_empty() {
+                return Err(anyhow!("Link predicate must not be empty"));
+            }
+            check("predicate", predicate)?;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust-executor/src/types.rs` around lines 111 - 140, In validate(), the
current predicate branch allows an empty string to bypass URI validation; update
the predicate handling in the validate() method so that when self.predicate is
Some(predicate) you first return an error if predicate.is_empty() (matching the
behavior used for source/target) and otherwise call the existing check closure;
reference the validate function, the local check closure, and the predicate
variable to locate where to add the explicit empty-string rejection.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rust-executor/src/perspectives/perspective_instance.rs`:
- Around line 3374-3399: The code currently calls
Literal::from_string(...).to_url() and Literal::from_number(...).to_url() with
.expect(...) which can panic; replace those .expect calls with proper error
propagation so the function returns an Err instead of panicking. In the match
arm where serde_json::Value::String(s) and Number(n) are handled, change the
logic to call to_url() and handle the Result via either the ? operator (after
converting the to_url error into the function's error type with map_err/From) or
explicitly match/map_err to return Err with a clear context; keep the existing
fallback for numbers when n.as_f64() is None. Update use sites of
Literal::from_string, Literal::from_number, and to_url to propagate errors
instead of unwrapping.

Copy link
Copy Markdown
Contributor

@data-bot-coasys data-bot-coasys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch from CodeRabbit on the empty predicate case. Looking at the current code:

if let Some(predicate) = &self.predicate {
    if !predicate.is_empty() {
        check("predicate", predicate)?;
    }
}

When predicate = Some(""), the outer guard !predicate.is_empty() skips validation entirely — but the check closure already handles empty strings with its own guard. The fix is to remove the redundant outer check so check always runs when predicate is Some:

if let Some(predicate) = &self.predicate {
    check("predicate", predicate)?;
}

This makes Some("") properly rejected (empty predicate URI is invalid), while None still means "no predicate" which is fine. Simple one-line fix — happy to push it to the branch if you want. 🤖

@data-bot-coasys
Copy link
Copy Markdown
Contributor

Good catch from CodeRabbit on the empty predicate case. Looking at the current code:

if let Some(predicate) = &self.predicate {
    if !predicate.is_empty() {
        check("predicate", predicate)?;
    }
}

When predicate = Some(""), the outer guard !predicate.is_empty() skips validation entirely — but the check closure already handles empty strings. The fix is to remove the redundant outer guard so check always runs when predicate is Some:

if let Some(predicate) = &self.predicate {
    check("predicate", predicate)?;
}

This makes Some("") properly rejected, while None still means "no predicate" which is fine. Happy to push this one-line fix to the branch if useful. — Data 🤖

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/js/tests/neighbourhood.ts (1)

161-180: ⚠️ Potential issue | 🟠 Major

Stress test needs scheme-qualified sources, targets, and predicates throughout. The early updates are correct (using ad4m://root), but the test later adds links with scheme-less values like 'alice', 'bob', 'broadcast', 'broadcast-loopback' as sources/targets/predicates (lines 198–200, 222–224, 431, 448, 481, 499, 514). With the new validation, those will be rejected. Update all scheme-less values to scheme URIs (e.g., 'test://alice', 'test://bob', 'test://broadcast', 'test://signal') or confirm they bypass validation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/js/tests/neighbourhood.ts` around lines 161 - 180, Test uses
scheme-less link components which will be rejected by the new validation; update
all addLink calls and LinkQuery instances to use scheme-qualified URIs (e.g.,
replace 'alice'/'bob'/'broadcast'/'broadcast-loopback'/'signal' with
'test://alice','test://bob','test://broadcast','test://broadcast-loopback','test://signal')
wherever alice.perspective.addLink, bob.perspective.addLink, and new
LinkQuery({source/target/predicate: ...}) are used (references:
alice.perspective.addLink, bob.perspective.queryLinks, LinkQuery, aliceP1.uuid,
bobP1!.uuid), and run the test to confirm they now pass validation or explicitly
skip validation if a bypass is intended.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rust-executor/src/perspectives/perspective_instance.rs`:
- Around line 3374-3401: The number branch can return a bare numeric string when
n.as_f64() is None, producing an invalid URI; update the fallback in the
serde_json::Value::Number arm to wrap the fallback string with
Literal::from_string(...).to_url() (mirroring the String branch) so it yields a
valid URI with a scheme (handle/map_err the same way as the other conversions).
Locate the block around the match on value in perspective_instance.rs (the
serde_json::Value::Number case using n.as_f64(), Literal::from_number, and
value.to_string()) and replace the bare value.to_string() fallback with using
Literal::from_string(value.to_string()).to_url() and propagate errors as done
elsewhere.

---

Outside diff comments:
In `@tests/js/tests/neighbourhood.ts`:
- Around line 161-180: Test uses scheme-less link components which will be
rejected by the new validation; update all addLink calls and LinkQuery instances
to use scheme-qualified URIs (e.g., replace
'alice'/'bob'/'broadcast'/'broadcast-loopback'/'signal' with
'test://alice','test://bob','test://broadcast','test://broadcast-loopback','test://signal')
wherever alice.perspective.addLink, bob.perspective.addLink, and new
LinkQuery({source/target/predicate: ...}) are used (references:
alice.perspective.addLink, bob.perspective.queryLinks, LinkQuery, aliceP1.uuid,
bobP1!.uuid), and run the test to confirm they now pass validation or explicitly
skip validation if a bypass is intended.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/js/tests/neighbourhood.ts (1)

431-432: Signal / broadcast link data still uses non-URI strings — consider updating for consistency.

These five Link objects feed sendSignal and sendBroadcastU, which are ephemeral telepresence operations that bypass the four validated storage entry points (add_link, add_links, add_link_expression, link_mutations). They are therefore currently safe. However, they are the only link data in the file that haven't been migrated to URI-scheme values, and they would fail immediately if URI validation is ever extended to telepresence paths.

♻️ Suggested alignment (five call sites)
-    link.data = new Link({source: "alice", target: "bob", predicate: "signal"});
+    link.data = new Link({source: "ad4m://alice", target: "ad4m://bob", predicate: "ad4m://signal"});
-    let link2 = new Link({source: "bob", target: "alice", predicate: "signal"});
+    let link2 = new Link({source: "ad4m://bob", target: "ad4m://alice", predicate: "ad4m://signal"});
-    let link = new Link({source: "alice", target: "broadcast", predicate: "test"});
+    let link = new Link({source: "ad4m://alice", target: "ad4m://broadcast", predicate: "ad4m://test"});
-    let link2 = new Link({source: "alice", target: "broadcast-loopback", predicate: "test"});
+    let link2 = new Link({source: "ad4m://alice", target: "ad4m://broadcast-loopback", predicate: "ad4m://test"});
-    let link3 = new Link({source: "bob", target: "broadcast-loopback", predicate: "test"});
+    let link3 = new Link({source: "ad4m://bob", target: "ad4m://broadcast-loopback", predicate: "ad4m://test"});

Also applies to: 448-449, 481-481, 499-499, 514-514

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/js/tests/neighbourhood.ts` around lines 431 - 432, Update the five
ephemeral Link constructions used with sendSignal/sendBroadcastU to use
URI-scheme values instead of plain strings: replace Link({source: "alice",
target: "bob", predicate: "signal"}) and similar occurrences with canonical URIs
(e.g., "did:example:alice", "did:example:bob", "urn:predicate:signal" or your
project's canonical scheme), and update the ExpressionProof constructor
arguments ("sig", "key") to corresponding URI-formatted identifiers (e.g.,
"urn:proof:sig", "urn:key:..."). Locate the Link instantiations and
ExpressionProof usages around the sendSignal and sendBroadcastU call sites and
apply the same URI migration at all five places noted so they conform to your
URI validation rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rust-executor/src/perspectives/perspective_instance.rs`:
- Around line 3374-3403: Update the URI-scheme regex to be RFC 3986 compliant by
removing the underscore from the character class (change
r"^[a-zA-Z][a-zA-Z0-9+\-._]*:" to r"^[a-zA-Z][a-zA-Z0-9+\-\.]*:") wherever the
static URI_SCHEME_RE is defined (the one used in the match for value in
perspective_instance.rs and the one in Link::validate in types.rs), and replace
the default non-string/non-number branch in the match that returns
value.to_string() with calling Literal::from_json(value).to_url() (propagating
the existing map_err pattern) so non-string JSON values are encoded safely; also
apply the same regex fix inside Link::validate to keep validation consistent
with RFC 3986.

---

Nitpick comments:
In `@tests/js/tests/neighbourhood.ts`:
- Around line 431-432: Update the five ephemeral Link constructions used with
sendSignal/sendBroadcastU to use URI-scheme values instead of plain strings:
replace Link({source: "alice", target: "bob", predicate: "signal"}) and similar
occurrences with canonical URIs (e.g., "did:example:alice", "did:example:bob",
"urn:predicate:signal" or your project's canonical scheme), and update the
ExpressionProof constructor arguments ("sig", "key") to corresponding
URI-formatted identifiers (e.g., "urn:proof:sig", "urn:key:..."). Locate the
Link instantiations and ExpressionProof usages around the sendSignal and
sendBroadcastU call sites and apply the same URI migration at all five places
noted so they conform to your URI validation rules.

@lucksus lucksus merged commit 4dc511e into dev Feb 18, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants