vault-graphrag

22% better retrieval than any single search strategy — a hybrid GraphRAG server that makes your second brain actually searchable.

If you've built a knowledge base in Obsidian — notes, projects, research, meeting logs — you've probably hit the same wall everyone hits: you know the answer is in there somewhere, but search can't find it. You spend more time looking for notes than using them. The system that was supposed to make you smarter starts feeling like a cluttered drawer.

vault-graphrag fixes this by running four search strategies in parallel and fusing the results. It's exposed as a single MCP tool — one call, best answer, regardless of which strategy found it.

Why Second Brains Break Down

The promise of a "second brain" is that you capture knowledge once and retrieve it when you need it. The reality is that retrieval is where these systems fail. You save hundreds of notes, build links between them, and then can't find the one you need because:

You search for "motor settings" but the note is titled "VFD Configuration" — keyword search misses it
You search semantically for "project planning" and get conceptually similar notes, but not the specific project note that's two links away from a related note
You browse backlinks manually, but there's no way to rank which linked note is actually relevant to your question

Each search strategy works for some queries and fails for others. The problem isn't that your notes are disorganized — it's that no single search method can handle the variety of ways you need to find things.

This problem gets worse with AI agents. An agent searching your vault on your behalf has to guess which strategy to use, make multiple tool calls, and piece together results from different tools. Pick wrong and the answer doesn't come back at all.

How vault-graphrag Solves This

The vault

Obsidian stores notes as plain Markdown files locally. What makes it powerful for knowledge management is linking — any note can reference another with [[WikiLinks]], and those links are bidirectional. Over time, your notes form a knowledge graph: a web of connections that you authored, where the links themselves carry meaning. A link from a project note to [[stepper motors]] is a claim: "this concept is relevant here."

This is the principle behind the Zettelkasten method — the idea that knowledge lives in the relationships between notes, not just in the notes themselves. vault-graphrag is built to exploit that structure.

The search

vault_search runs four retrieval strategies in parallel, weights them based on what kind of answer the query needs, and returns one fused result set. The caller declares an intent — what kind of search this is — and the server handles the routing:

Intent	When to use it	Strategies emphasized
`factual_lookup`	Looking for a specific fact, name, or setting	Keyword search + memory recall
`context_load`	Loading everything related to a project or topic	WikiLink graph traversal
`conceptual`	Exploring ideas or finding thematically related notes	Semantic similarity
`backlink`	Finding what links to a specific note	Reverse WikiLink scan
`serendipity`	Open-ended exploration, discovering unexpected connections	Semantic + graph equally

If the caller doesn't specify an intent, a local LLM classifies the query automatically.

Long-term memory (optional)

Hindsight is a long-term memory service for AI agents. It ingests conversations over time, extracts structured facts (decisions made, preferences stated, events that happened), and builds a persistent knowledge graph from them. vault-graphrag can optionally query Hindsight as a fourth retrieval channel, so a single search returns both your vault notes and facts the agent remembers from past conversations — even if those facts were never written down as notes.

Does it actually work better?

Yes. Evaluated against 15 gold-standard queries across all 5 intent types on a ~1,500-note vault. Each query has known correct notes — the eval measures whether vault_search finds them and how high they rank.

Note on running evals yourself: The gold queries in eval/gold.json are tied to the specific vault structure used during development. Before running eval/run_eval.py against your own vault, replace the contents of eval/gold.json with queries and expected note paths that match your vault.

Overall retrieval quality

Hit@5 = 93.3% — 14 out of 15 queries had the correct note in the top 5 results
Hit@10 = 100% — every query found the correct note somewhere in the top 10
MRR = 0.77 — "Mean Reciprocal Rank," a standard information retrieval metric. An MRR of 0.77 means the correct answer typically appears at rank 1 or 2. (MRR = 1.0 would mean every query returns the correct note first.)

The single miss at Hit@5 was a serendipity query — intentionally open-ended, where the expected notes were only weakly connected to the query terms.

Why not just use one search strategy?

If you committed to a single strategy for all queries, the best you could do is BM25 at MRR 0.63. Semantic search scores 0.56. Graph traversal scores 0.27. Fused retrieval scores 0.77 — 22% better than the best single strategy, because different query types need different strategies, and fusion handles the routing automatically.

Architecture

                    +----------------------------------+
                    |         vault_search()           |
                    |   intent routing -> weight vector |
                    +---------------+------------------+
                                    |
               +--------------------+--------------------+
               |  asyncio.gather (all channels parallel) |
               +--------------------+--------------------+
               |          |          |          |
          +----v--+  +---v---+  +--v---+  +--v--------+
          | BM25  |  |Semantic|  |Graph |  |Hindsight  |
          |Okapi  |  |cosine  |  | BFS  |  | recall    |
          +---+---+  +---+---+  +--+---+  +--+--------+
              |          |         |          |
              +----------+----+----+----------+
                              |
                     +--------v--------+
                     |  RRF Fusion     |
                     |  weighted by    |
                     |  intent vector  |
                     +--------+--------+
                              |
                     +--------v--------+
                     |  Deduplicate,   |
                     |  normalize,     |
                     |  threshold,     |
                     |  annotate       |
                     +-----------------+

Channels

BM25 — Indexes all .md files using BM25Okapi with mtime-based cache invalidation. Handles keyword and exact-term queries. Returns normalized scores with excerpt extraction.

Semantic — Reads the pre-built embedding index from the Smart Connections Obsidian plugin. Embeds queries in-process via sentence-transformers (TaylorAI/bge-micro-v2). No separate vector database — piggybacks on the plugin's existing index.

WikiLink Graph — Parses [[WikiLinks]] across the vault and builds an adjacency graph. Forward BFS for context_load (relevance decays by hop depth), reverse scan for backlink. This is the channel that exploits the structure unique to linked-note systems.

Hindsight — Queries a Hindsight memory service for facts retained from prior AI conversations. Optional — disabled when HINDSIGHT_URL is unset.

All channels degrade gracefully. If a dependency is missing or a service is unreachable, that channel returns an empty list instead of an error. The server works with whatever is available.

Fusion

Results from all channels are merged via Reciprocal Rank Fusion (k=60), weighted by the intent's channel vector. Scores are normalized so the top result is always 1.0, making the threshold parameter meaningful regardless of how many channels contributed.

Every result includes a match_reason explaining why it was returned — enforced at the schema level.

Ollama dependency

Intent classification uses a local Ollama model (gemma2:2b by default, ~800ms). This is only needed when the caller doesn't pass an explicit intent. Query embedding runs in-process and does not require Ollama.

If Ollama is unreachable, intent defaults to conceptual — semantic search still works, you just lose automatic routing.

Tool Schema

vault_search(
    query: str,              # Natural language query (required)
    intent: str | None,      # One of the 5 intents; auto-classified if omitted
    root_note: str | None,   # Vault-relative path; anchor for context_load
    max_results: int = 10,
    hop_depth: int = 2,      # WikiLink traversal depth
    threshold: float = 0.6,  # Minimum relevance score (0-1)
)

Each result includes: path, title, channel, relevance, match_reason, excerpt, depth, connected_via.

Setup

Local Development

pip install -e ".[dev]"
cp .env.example .env
# Edit .env — set VAULT_PATH at minimum
python -m vault_graphrag.server

Docker

docker build -t vault-graphrag .
docker run -d \
  -p 8765:8765 \
  -v /path/to/your/vault:/vault:ro \
  -e VAULT_PATH=/vault \
  vault-graphrag
curl http://localhost:8765/health

The vault is mounted read-only. When deployed alongside Hindsight, both services should share a Docker network. Set HINDSIGHT_URL to the container-internal address (e.g. http://hindsight:8000). Hindsight is optional — omit HINDSIGHT_URL to disable the channel.

MCP Client Configuration

SSE transport (Docker or remote):

{
  "vault-graphrag": {
    "type": "sse",
    "url": "http://localhost:8765/sse"
  }
}

Stdio transport (local):

{
  "vault-graphrag": {
    "type": "stdio",
    "command": "python",
    "args": ["-m", "vault_graphrag.server"]
  }
}

Configuration

Variable	Required	Default	Description
`VAULT_PATH`	Yes	--	Absolute path to the Obsidian vault
`OLLAMA_URL`	No	`http://localhost:11434`	Ollama base URL (intent classification only)
`OLLAMA_INTENT_MODEL`	No	`gemma2:2b`	Model for intent classification
`HINDSIGHT_URL`	No	(disabled)	Hindsight REST API base URL
`MCP_TRANSPORT`	No	`sse`	`sse` or `stdio`
`MCP_HOST`	No	`0.0.0.0`	Bind address
`MCP_PORT`	No	`8765`	Bind port

Health Check

GET /health

{
  "status": "ok",
  "channels": {
    "bm25": true,
    "semantic": true,
    "graph": true,
    "hindsight": true
  },
  "vault_path": "/vault",
  "note_count": 1024
}

"ok" = all channels available. "degraded" = functional but partial.

vault_path is included in the response body to aid debugging of misconfigured deployments (e.g. wrong mount path in Docker). It is omitted if VAULT_PATH is unset.

Design Decisions

No re-indexing. Reads the Smart Connections plugin's existing embedding index instead of maintaining a separate vector store.
Intent as the API surface. Internal routing can change without breaking callers.
Hindsight as a retrieval channel. Vault notes and conversation memories come back in one fused response.
match_reason is mandatory. Enforced at the Pydantic model level. No result is returned without an explanation.
serendipity is a first-class intent. Discovery gets its own weight vector optimized for latent connections at the semantic + graph intersection.
Graceful degradation. Any channel failure returns empty, never errors.

Research Basis

GraphRAG (Microsoft, 2024) — hybrid vector + graph retrieval with Reciprocal Rank Fusion
MIND-RAG — intent-aware routing dispatching to specialized retrieval agents
Hybrid RAG — dense semantic + sparse lexical (BM25) fusion
Luhmann (1981), "Communicating with Slip Boxes" — the zettelkasten as communication partner with emergent properties from link density

Stack

Python 3.11+ / FastMCP / Pydantic
rank-bm25 (lexical search)
sentence-transformers (query embedding, in-process)
NumPy (cosine similarity)
httpx (async HTTP for Hindsight + Ollama)
Ollama (intent classification only)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
eval		eval
src/vault_graphrag		src/vault_graphrag
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vault-graphrag

Why Second Brains Break Down

How vault-graphrag Solves This

The vault

The search

Long-term memory (optional)

Does it actually work better?

Overall retrieval quality

Why not just use one search strategy?

Architecture

Channels

Fusion

Ollama dependency

Tool Schema

Setup

Local Development

Docker

MCP Client Configuration

Configuration

Health Check

Design Decisions

Research Basis

Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vault-graphrag

Why Second Brains Break Down

How vault-graphrag Solves This

The vault

The search

Long-term memory (optional)

Does it actually work better?

Overall retrieval quality

Why not just use one search strategy?

Architecture

Channels

Fusion

Ollama dependency

Tool Schema

Setup

Local Development

Docker

MCP Client Configuration

Configuration

Health Check

Design Decisions

Research Basis

Stack

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages