Vaultex

Vaultex is a GraphRAG-powered MCP server that transforms an Obsidian vault into a semantically queryable knowledge graph. It indexes your notes at the atomic proposition level, weaves propositions into a typed knowledge graph, and exposes the result over the Model Context Protocol so any MCP-compatible AI client can search and reason over your vault.

Unlike file-access bridges that let a model read raw markdown, Vaultex breaks every note into self-contained facts, embeds them, and connects them through three complementary edge types — making retrieval both semantically precise and structurally aware of your wikilink graph.

How It Works

1. Ingestion

When you run vaultex ingest, Vaultex walks your vault and processes each .md file:

Split the note into sections by Markdown header hierarchy.
Extract atomic propositions from each section using Claude Haiku. Every proposition is self-contained — pronouns are resolved, context is included, one fact per entry.
Embed all propositions with text-embedding-3-small in one batched API call.
Store the vectors in LanceDB and record metadata in SQLite.
Build three types of graph edges (see below).

Content hashes are stored per-note, so incremental re-ingestion only processes changed files.

2. The Knowledge Graph

Every proposition is a node. Edges connect them in three ways:

Edge Type	Weight	When Created
SAME_NOTE	2.0	Between every pair of propositions extracted from the same note
HARD_LINK	1.0	When note A has a `[[wikilink]]` to note B — connects A's propositions to B's
SOFT_LINK	cosine similarity	Cross-note pairs whose embedding similarity exceeds an adaptive percentile threshold

Graph traversal scores paths by the product of edge weights along the route. SAME_NOTE → SAME_NOTE = 4.0 (certain), SOFT_LINK → SOFT_LINK = 0.36 (speculative).

3. The MCP Server

vaultex serve starts an MCP server. Any MCP-compatible client (Claude Desktop, Claude Code, Cursor, etc.) can then call six tools — from raw semantic search up to a fully automated multi-step research pipeline.

4. Live Watching

With watching enabled (the default), Vaultex monitors your vault with a debounced file watcher. Saving a note re-ingests it within a few seconds. Deleting a note removes all its propositions from the index immediately.

Configuration

Vaultex looks for configuration in four places, with later sources overriding earlier ones:

A .env file in the working directory
System environment variables
.vaultex/config.yaml inside the vault
VAULTEX_* prefixed environment variables (highest priority)

Required

Variable	Description
`OPENROUTER_API_KEY`	Routes calls to both the Anthropic and OpenAI models via openrouter.ai

Full Configuration Reference

These can be set as environment variables or in .vaultex/config.yaml:

# Core
exclude_patterns:               # Glob patterns to skip during ingestion
  - "templates/**"              # default
  - "_archive/**"               # default
  - ".obsidian/**"              # default
debounce_seconds: 5             # File watcher debounce interval (seconds)

# API concurrency
max_concurrent_api_calls: 5     # Parallel LLM extraction threads per note

# Models (all routed via OpenRouter)
openai_model: text-embedding-3-small
embedding_dimensions: 1536
haiku_model: anthropic/claude-haiku-4-5
deep_research_sonnet_model: anthropic/claude-sonnet-4.6

# Soft-link thresholding
soft_link_percentile: 95              # Top N% most similar cross-note pairs get an edge
soft_link_recalc_growth_trigger: 0.20 # Skip recalc if proposition count grew < 20%

# Storage
data_dir: .vaultex              # Relative to vault root, or absolute path

Deep research defaults:

deep_research_default_depth: standard
deep_research_standard_top_k: 10
deep_research_thorough_top_k: 20
deep_research_standard_walks: 3
deep_research_thorough_walks: 5
deep_research_standard_note_budget: 1500   # chars of note context per note
deep_research_thorough_note_budget: 3000

Running Vaultex

Initial Ingestion

Run this once before starting the server. On a vault of a few hundred notes it takes a couple of minutes.

vaultex ingest --vault /path/to/your/vault

Force a full re-ingest (ignores content hashes):

vaultex ingest --vault /path/to/your/vault --force

Ingest a single note (useful after editing one file while the server isn't running):

vaultex ingest-note "projects/api-rewrite.md" --vault /path/to/your/vault

WSL / Windows vaults: If your vault is on a Windows filesystem (/mnt/c/...), LanceDB cannot write there. Use --data-dir to store the index on the Linux filesystem:
vaultex ingest --vault /mnt/c/Users/you/vault --data-dir ~/.vaultex/my-vault

Starting the MCP Server

For Claude Desktop (client spawns this process over stdio):

vaultex serve --vault /path/to/your/vault

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "vaultex": {
      "command": "vaultex",
      "args": ["serve", "--vault", "/path/to/your/vault"],
      "env": {
        "OPENROUTER_API_KEY": "sk-or-..."
      }
    }
  }
}

For Claude Code or any HTTP-based client (server runs as a standalone process):

vaultex serve --vault /path/to/your/vault --transport sse --port 8765

Then add http://127.0.0.1:8765/sse as a custom MCP connector.

Serve without live watching (read-only, no file monitoring):

vaultex serve --vault /path/to/your/vault --no-watch

Other Commands

Check ingestion status (note count, propositions, errors):

vaultex status --vault /path/to/your/vault

Recompute the soft-link similarity threshold after a large ingest:

vaultex recalculate-threshold --vault /path/to/your/vault --percentile 95

Run an end-to-end smoke test against a random sample of notes (non-destructive, uses a temp directory):

vaultex smoke-test --vault /path/to/your/vault --sample 20

Launch the graph explorer web UI:

vaultex explore --vault /path/to/your/vault --port 7333

MCP Tools Reference

Once the server is running, these six tools are available to any connected client.

`semantic_search`

Embed a query and return the most similar propositions from the vault.

semantic_search(query: str, top_k: int = 10) → list[dict]

Parameter	Default	Description
`query`	—	Natural language query
`top_k`	10	Number of results (max 50)

Returns a list of:

{
  "id": "sha256-hex",
  "text": "The proposition text.",
  "source_note": "projects/api-rewrite.md",
  "source_section": "## Timeline",
  "similarity_score": 0.51
}

Scores from text-embedding-3-small cluster in the 0.3–0.6 range. A score of 0.45 or above is a strong match. Focus on relative ranking, not absolute values.

`get_graph_neighborhood`

Traverse the knowledge graph outward from a proposition, following typed edges.

get_graph_neighborhood(
  proposition_id: str,
  max_hops: int = 2,
  max_results: int = 20,
  min_path_score: float = 0.3,
  edge_types: list[str] | None = None
) → dict

Parameter	Default	Description
`proposition_id`	—	ID from a `semantic_search` result
`max_hops`	2	Traversal depth (1–3)
`max_results`	20	Max neighbors to return
`min_path_score`	0.3	Minimum cumulative path score (product of edge weights)
`edge_types`	all	Restrict to `["SAME_NOTE"]`, `["HARD_LINK"]`, `["SOFT_LINK"]`, or any combination

Returns:

{
  "origin": {"id": "...", "text": "...", "source_note": "..."},
  "neighbors": [
    {
      "id": "...",
      "text": "...",
      "source_note": "...",
      "path_score": 2.0,
      "path": [{"edge_type": "SAME_NOTE", "weight": 2.0, "via_node_id": "..."}]
    }
  ]
}

For focused lookups, pass edge_types=["SAME_NOTE", "HARD_LINK"] to follow structural connections only. Include "SOFT_LINK" for exploratory queries to surface latent connections.

`read_full_note`

Read the full markdown content of a note, including parsed frontmatter and outgoing links.

read_full_note(note_path: str) → dict

Returns:

{
  "path": "projects/api-rewrite.md",
  "content": "# API Rewrite\n...",
  "frontmatter": {"status": "in-progress", "tags": ["engineering"]},
  "outgoing_links": ["people/marc.md", "projects/graphql.md"],
  "proposition_count": 14
}

`get_note_propositions`

Retrieve every proposition extracted from a specific note.

get_note_propositions(note_path: str) → list[dict]

Returns:

[
  {"id": "...", "text": "Marc is the lead engineer on the API rewrite.", "source_section": "## Team"},
  ...
]

Useful as a coverage check — if a central note appears in search results but you want all its indexed facts, use this.

`get_ingestion_status`

Query the health and coverage of the current index.

get_ingestion_status() → dict

Returns:

{
  "status": "idle",
  "total_notes": 342,
  "total_propositions": 4821,
  "total_graph_edges": 18234,
  "last_ingestion": "2026-03-24T18:45:12+00:00",
  "soft_link_threshold": 0.6143,
  "errors_last_24h": 0
}

`deep_research`

Automated multi-step retrieval pipeline — query expansion, semantic search, graph traversal, note reading, and LLM synthesis in one call.

deep_research(query: str, depth: str = "standard") → dict

Parameter	Default	Description
`query`	—	Natural language question about the vault
`depth`	`"standard"`	`"standard"` or `"thorough"`

standard — Haiku synthesis, 10 search hits, 3 graph walks, up to 3 notes read. Fast and cheap. Best for factual lookups, status checks, and routine questions.

thorough — Sonnet synthesis, 20 search hits, 5 graph walks, up to 5 notes read. Slower and more expensive. Best for nuanced analysis, cross-cutting themes, or questions that require judgment.

Returns:

{
  "answer": "Marc is the lead engineer on the API rewrite project...",
  "confidence": "high",
  "factlets": [
    {
      "text": "Marc joined the API rewrite team in January 2026.",
      "source_note": "people/marc.md",
      "source_section": "## Background",
      "discovery_method": "graph_walk"
    }
  ],
  "notes_consulted": ["people/marc.md", "projects/api-rewrite.md"],
  "retrieval_stats": {
    "search_hits": 10,
    "graph_nodes_visited": 43,
    "notes_read": 2,
    "synthesis_model": "anthropic/claude-haiku-4-5",
    "depth": "standard"
  }
}

discovery_method is one of "search", "graph_walk", or "note_read". Facts corroborated by multiple discovery methods carry higher confidence in the synthesized answer.

Deep Search Procedure (Manual)

When you want direct control over retrieval, use the individual tools in sequence. The server also exposes this procedure as a built-in MCP prompt (deep_search_guide).

Step 0 — Expand the query. Short keywords produce weak embeddings.

User query	Search with
"Marc"	"Marc, the engineer who works on GraphQL and the API rewrite"
"Q3 timeline"	"Q3 deadline and project timeline for the current quarter"
"what happened today?"	"daily note, tasks completed, meetings, and updates from today"

Step 1 — Semantic search (wide net).

semantic_search(query=<expanded_query>, top_k=10)

Note the source_note values of the top results.

Step 2 — Graph walk (top 3–5 hits).

get_graph_neighborhood(proposition_id=<id>, max_hops=2, min_path_score=0.3)

This is where the real value is. HARD_LINK edges surface connections that embeddings alone miss — the link from "project note" to "person note" can only be found by following [[wikilinks]] in the graph. Deduplicate results across walks.

Step 3 — Read source notes.

read_full_note(note_path=<source_note>)

Gets original prose context, frontmatter tags and fields, and outgoing wikilinks.

Step 4 — Coverage check (optional).

get_note_propositions(note_path=<source_note>)

Pull all propositions from a central note if the first steps didn't surface many from it.

Step 5 — Synthesize. Assemble the answer, citing source notes. Prefer facts corroborated by multiple paths.

A typical deep search takes about 8–10 tool calls total.

The `.vaultex-ignore` File

Place a .vaultex-ignore file in your vault root to exclude files from indexing and watching. It uses fnmatch glob patterns, one per line:

# Personal notes — never index these
personal/**
diary/*.md

# Specific file types
*.excalidraw.md
*.canvas

# Temporary / draft files
daily/private-*.md

Blank lines and lines beginning with # are treated as comments. The watcher detects changes to .vaultex-ignore and reloads patterns live — no server restart needed.

The exclude_patterns config option applies the same logic but is configured in config.yaml or environment variables, making it suitable for patterns shared across multiple vaults or enforced at a system level.

Data Directory

By default, all Vaultex data is written to .vaultex/ inside the vault:

<vault>/.vaultex/
├── vaultex.db          # SQLite: note registry, config store, ingestion log
├── lancedb/            # LanceDB vector database
├── graph.pkl           # Serialized proposition graph (rustworkx, pickle)
└── config.yaml         # Per-vault config overrides (optional)

The SQLite database has three tables:

note_registry — One row per ingested note: path, content hash, last-processed timestamp, proposition count.
config — Key/value store for computed values: soft_link_threshold, soft_link_prop_count.
ingestion_log — Append-only event log: processed, skipped, error, deleted, threshold_recalc events with timestamps and details.

Override the data directory with --data-dir or the data_dir config option. This is particularly useful on WSL where the vault may be on a Windows filesystem but LanceDB requires a native Linux path.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src/vaultex		src/vaultex
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
config.gemini.yaml		config.gemini.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vaultex

How It Works

1. Ingestion

2. The Knowledge Graph

3. The MCP Server

4. Live Watching

Configuration

Required

Full Configuration Reference

Running Vaultex

Initial Ingestion

Starting the MCP Server

Other Commands

MCP Tools Reference

`semantic_search`

`get_graph_neighborhood`

`read_full_note`

`get_note_propositions`

`get_ingestion_status`

`deep_research`

Deep Search Procedure (Manual)

The `.vaultex-ignore` File

Data Directory

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vaultex

How It Works

1. Ingestion

2. The Knowledge Graph

3. The MCP Server

4. Live Watching

Configuration

Required

Full Configuration Reference

Running Vaultex

Initial Ingestion

Starting the MCP Server

Other Commands

MCP Tools Reference

semantic_search

get_graph_neighborhood

read_full_note

get_note_propositions

get_ingestion_status

deep_research

Deep Search Procedure (Manual)

The .vaultex-ignore File

Data Directory

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`semantic_search`

`get_graph_neighborhood`

`read_full_note`

`get_note_propositions`

`get_ingestion_status`

`deep_research`

The `.vaultex-ignore` File

Packages