Proposal: Five-Layer Memory Enhancement (Semantic Search, Salience, Graph, Consolidation, Priming)

### 📋 Pre-flight Checks

- [x] I have searched [existing issues](https://github.com/Gentleman-Programming/engram/issues) and this is not a duplicate
- [x] I understand this issue needs `status:approved` before a PR can be opened

### 🔍 Problem Description

## Executive Summary

Engram solves the hardest problem in AI-assisted development: persistent context across sessions. Its local-first SQLite architecture, FTS5 search, and zero-dependency binary make it the most practical memory system available for coding agents today.

This proposal identifies five capabilities that would elevate Engram from a persistent note system to an associative, temporal, self-organizing knowledge layer—without compromising its core design principles. Each capability is opt-in, preserves the zero-dependency default, and builds on the existing schema rather than replacing it.

The five capabilities form a dependency chain. They are presented in implementation order.

---

## Current State Analysis

### What Engram does well

- **Persistence**: Observations survive across sessions and context compactions
- **Topic-key upsert**: Evolving knowledge updates in place instead of duplicating
- **Deduplication**: Content-hash + time-window prevents redundant writes
- **FTS5 search**: Fast keyword-based retrieval with BM25 ranking
- **Privacy**: `<private>` tag stripping at the store layer
- **Zero dependencies**: Single Go binary, pure SQLite, no CGO, no runtime

### Where the gaps are

| Gap | Current behavior | Impact |
|-----|-----------------|--------|
| **Lexical-only search** | FTS5 matches exact words. "state management" won't find "Zustand store choice" | Relevant memories are invisible unless the agent guesses the exact phrasing used when saving |
| **Isolated observations** | Each observation is an independent row. No relationships between them | Cannot answer "what decisions led to this architecture?" or "what superseded this choice?" |
| **Flat relevance** | All observations have equal retrieval priority | When context budget is limited, no mechanism to surface the most useful memories first |
| **Linear accumulation** | Observations pile up over time. Topic-key upsert helps for single-topic evolution, but cross-topic overlap is undetected | Memory bloat degrades search quality and consumes more tokens on retrieval |
| **Chronological context** | `mem_context` returns recent sessions/observations by time | Context injection is time-based, not relevance-based. Starting a session about auth loads the last 5 sessions regardless of topic |

### 💡 Proposed Solution

## Proposed Capabilities

### Capability 1: Hybrid Search (Vector + FTS5)

> **Note:** Issue #139 by @jtomaszon proposes a comprehensive implementation of this capability with pluggable providers, an `observation_embeddings` table, and Reciprocal Rank Fusion (RRF). This proposal endorses that approach and frames it within the broader enhancement roadmap. The implementation details below are compatible with #139's design.

#### What it solves

FTS5 is lexical — it matches words, not meaning. When an agent saves "we chose Zustand for client-side state management" and later searches "how do we handle frontend reactivity", FTS5 returns nothing. The concepts are semantically linked but share zero keywords.

#### How it works

Each observation gets an embedding vector generated at save time. Search runs BOTH FTS5 (keyword match) and vector similarity (semantic match) in parallel, then merges results using Reciprocal Rank Fusion (RRF):

```
score(doc) = 1/(k + rank_fts5) + 1/(k + rank_vector)
```

Where `k` is a tuning constant (typically 60) that controls how much weight goes to top-ranked results.

#### Schema addition

```sql
CREATE TABLE observation_embeddings (
    observation_id INTEGER PRIMARY KEY REFERENCES observations(id),
    provider       TEXT NOT NULL,           -- 'ollama', 'openai', 'local'
    model          TEXT NOT NULL,           -- 'nomic-embed-text', 'text-embedding-3-small'
    dimensions     INTEGER NOT NULL,
    embedding      BLOB NOT NULL,           -- float32 array
    created_at     TEXT NOT NULL DEFAULT (datetime('now'))
);
```

#### Design principles (aligned with Engram's philosophy)

- **Opt-in**: Embeddings are generated only if an embedding provider is configured. Without configuration, Engram behaves exactly as today — pure FTS5
- **Provider-agnostic**: Pluggable interface supporting local models (Ollama), cloud APIs (OpenAI), or future sqlite-vss integration
- **Local-first preferred**: Default recommendation is Ollama with `nomic-embed-text` — runs locally, no API keys, no data leaves the machine
- **Async generation**: Embedding computation happens asynchronously after save confirmation. The agent is never blocked waiting for embeddings
- **Backfill CLI**: `engram backfill-embeddings` processes all existing observations that lack embeddings. Fully retroactive
- **Graceful degradation**: If the embedding provider is unavailable (Ollama not running, API key expired), save succeeds normally — the observation just won't have an embedding until next backfill

#### Impact

Engram goes from "find what I said" to "find what I meant." This is the foundational capability that enables Capabilities 3 and 5.

---

### Capability 2: Salience Scoring

#### What it solves

All observations have equal retrieval weight. When an agent's context budget allows loading 10 memories, there's no way to determine which 10 are most valuable. Time-based ordering is a weak proxy — a critical architecture decision from 3 months ago is more important than yesterday's typo fix.

#### How it works

Each observation gets a `salience` score that reflects its real-world importance based on actual usage patterns:

- **Access boost**: Every `mem_get_observation` or search result click increments salience by a configurable delta (default: +0.1)
- **Time decay**: A background process (or on-read lazy evaluation) applies exponential decay: `salience = salience * decay_factor ^ days_since_last_access`
- **Type weighting**: Observations of type `architecture` or `decision` start with higher base salience than `bugfix` or `discovery`, reflecting their typical long-term value
- **Revision signal**: Higher `revision_count` (already tracked) correlates with actively maintained knowledge — factor it into salience

#### Schema addition

```sql
ALTER TABLE observations ADD COLUMN salience REAL NOT NULL DEFAULT 1.0;
ALTER TABLE observations ADD COLUMN access_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE observations ADD COLUMN last_accessed_at TEXT;

CREATE INDEX idx_obs_salience ON observations(salience DESC);
```

#### Integration with search

The merged search score becomes:

```
final_score(doc) = retrieval_score(doc) * salience_weight(doc)
```

Where `retrieval_score` comes from FTS5 (or RRF if embeddings are enabled), and `salience_weight` is a normalized salience value.

#### Design principles

- **Zero configuration required**: Works out of the box with sensible defaults
- **Observable**: `mem_stats` exposes salience distribution (min, max, median, P90) so users can tune decay parameters
- **No data loss**: Salience affects ranking, never deletion. Low-salience observations are deprioritized, not removed
- **Retroactive**: Existing observations start at 1.0 and begin differentiating immediately through natural usage

#### Impact

Engram learns which memories matter based on behavior, not declarations. The most accessed and actively maintained knowledge surfaces first.

---

### Capability 3: Automatic Consolidation

> **Depends on:** Capability 1 (Hybrid Search) for semantic similarity detection

#### What it solves

Over months of use, an agent accumulates hundreds of observations with significant overlap. A project might have 15 observations about authentication — the initial choice, three iterations, two bug fixes, a migration decision. The relevant knowledge is scattered across all of them. Loading all 15 wastes tokens; loading any one is incomplete.

#### How it works

A consolidation process (triggered manually via CLI or automatically at configurable intervals) identifies clusters of semantically similar observations and proposes merges:

1. **Cluster detection**: Group observations by semantic similarity (cosine similarity > configurable threshold, e.g., 0.82) within the same project and scope
2. **Conflict detection**: Within each cluster, identify contradictions (e.g., "we chose PostgreSQL" vs. "we migrated to SQLite"). Contradictions are flagged, not auto-resolved
3. **Merge proposal**: For non-conflicting clusters, generate a consolidated observation that preserves all unique information from the cluster members
4. **Lineage tracking**: Merged observations store references to their source observation IDs. The originals are soft-deleted (`deleted_at` set) but remain queryable for audit

#### Schema addition

```sql
CREATE TABLE consolidation_lineage (
    consolidated_id INTEGER NOT NULL REFERENCES observations(id),
    source_id       INTEGER NOT NULL REFERENCES observations(id),
    created_at      TEXT NOT NULL DEFAULT (datetime('now')),
    PRIMARY KEY (consolidated_id, source_id)
);
```

#### Merge strategies

| Scenario | Strategy |
|----------|----------|
| Same topic, no contradiction | Merge into single observation with combined content |
| Same topic, temporal evolution | Keep latest, reference history in lineage table |
| Same topic, contradiction | Flag for human review — do not auto-merge |
| Cross-topic overlap | Do not merge — observations serve different retrieval paths |

#### Design principles

- **Human-in-the-loop by default**: First implementation should be `engram consolidate --dry-run` that shows proposed merges. Auto-merge is a future opt-in
- **Reversible**: Source observations are soft-deleted, not destroyed. `engram consolidate --undo <consolidation_id>` restores them
- **Conservative threshold**: Default similarity threshold should be high (0.85+) to avoid false merges. Better to under-merge than over-merge
- **Respects topic_key boundaries**: Observations with different topic_keys are never merged, even if semantically similar — the agent assigned different keys for a reason

#### Impact

Engram stays lean over time. Instead of 500 observations with 60% overlap, you get 200 dense, comprehensive observations. More knowledge in fewer tokens.

---

### Capability 4: Temporal Knowledge Graph

#### What it solves

Observations exist in isolation. There's no way to express or query relationships like:

- "This SQLite decision **superseded** the PostgreSQL decision"
- "This bug **was caused by** that architecture choice"
- "These three observations **relate to** the auth module"

Without relationships, the agent can't navigate knowledge — it can only search for it.

#### How it works

A lightweight relationship layer on top of existing observations. Relationships are typed, directed, and temporally scoped:

```
[Observation A] --relationship_type--> [Observation B]
                  valid_from: 2026-03-01
                  valid_to: null (still active)
```

#### Schema addition

```sql
CREATE TABLE observation_links (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    from_id         INTEGER NOT NULL REFERENCES observations(id),
    to_id           INTEGER NOT NULL REFERENCES observations(id),
    relationship    TEXT NOT NULL,  -- 'supersedes', 'caused_by', 'relates_to', 'implements', 'reverts'
    valid_from      TEXT NOT NULL DEFAULT (datetime('now')),
    valid_to        TEXT,           -- NULL = still active
    created_at      TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(from_id, to_id, relationship)
);

CREATE INDEX idx_links_from ON observation_links(from_id, relationship);
CREATE INDEX idx_links_to ON observation_links(to_id, relationship);
CREATE INDEX idx_links_validity ON observation_links(valid_from, valid_to);
```

#### Relationship types

| Type | Semantics | Example |
|------|-----------|---------|
| `supersedes` | A replaces B as the current truth | "Migrate to SQLite" supersedes "Choose PostgreSQL" |
| `caused_by` | A was a consequence of B | "N+1 query bug" caused_by "ORM lazy loading decision" |
| `relates_to` | A and B cover related topics | "Auth middleware" relates_to "JWT library choice" |
| `implements` | A is the execution of B | "Add rate limiting code" implements "Rate limiting decision" |
| `reverts` | A undoes B | "Rollback feature flag" reverts "Enable feature flag" |

#### New MCP tool

```
mem_link(from_id, to_id, relationship, valid_from?, valid_to?)
mem_traverse(observation_id, relationship?, direction?, depth?)
```

`mem_traverse` returns connected observations up to N hops away, optionally filtered by relationship type. This enables queries like "show me everything that led to this decision" or "what did this architecture choice cause?"

#### Design principles

- **Agent-driven linking**: The agent creates links explicitly during `mem_save` or as a separate operation. No automatic entity extraction in v1 — that's complex, error-prone, and adds NLP dependencies
- **Temporal validity**: Links have `valid_from` / `valid_to` so the graph reflects what's CURRENT, not just what was ever true
- **Lightweight traversal**: Depth-limited BFS on a small graph (hundreds to low thousands of nodes). No need for a graph database — SQLite handles this with recursive CTEs
- **Existing Obsidian export synergy**: The Obsidian Brain export already generates a graph visualization. Links would make that graph actually meaningful instead of just proximity-based

#### Impact

Engram becomes navigable. Instead of searching in the dark, the agent can follow relationship chains to build a complete picture of how knowledge connects. "What decisions led to the current auth system?" becomes a traversal query, not a prayer to FTS5.

---

### Capability 5: Contextual Priming

> **Depends on:** Capability 1 (Hybrid Search), enhanced by Capability 2 (Salience Scoring)

#### What it solves

`mem_context` returns recent sessions and observations chronologically. If an agent starts a session about database optimization, it gets the last 5 sessions — which might be about CSS, auth, and deployment. The context injection is time-based, not relevance-based.

#### How it works

A new tool `mem_prime` (or an enhanced `mem_context` mode) takes the current task context as input and returns the most relevant observations regardless of recency:

1. **Input**: The agent passes its current context (user's first message, current file, active task description)
2. **Embedding**: The input is embedded using the same provider as observations
3. **Retrieval**: Vector similarity search against all observation embeddings, weighted by salience score
4. **Diversification**: Results are diversified to avoid returning 5 observations about the same sub-topic. MMR (Maximal Marginal Relevance) ensures breadth
5. **Output**: Top-N most relevant observations, ordered by combined relevance + salience score

#### New MCP tool

```
mem_prime(context, project?, scope?, limit?)
```

- `context`: Free-text description of what the agent is about to work on
- Returns: Ranked list of observations most relevant to that context

#### How it differs from `mem_search`

| Aspect | `mem_search` | `mem_prime` |
|--------|-------------|-------------|
| Input | A search query (keywords) | A task context (paragraph) |
| Intent | "Find this specific thing" | "What should I know before starting?" |
| Ranking | BM25 / RRF relevance | Relevance × salience × diversity |
| Diversity | None — may return 5 hits about the same thing | MMR ensures topic coverage |
| Typical use | Mid-task lookup | Session start / task start |

#### Design principles

- **Complementary, not replacing**: `mem_context` (chronological) and `mem_search` (keyword) remain unchanged. `mem_prime` is a new retrieval mode for a different use case
- **Requires embeddings**: Only available when an embedding provider is configured. Without embeddings, agents continue using `mem_context` and `mem_search` as today
- **Agent protocol integration**: The CLAUDE.md protocol currently instructs agents to call `mem_context` at session start. With `mem_prime`, the protocol would add: "After `mem_context`, call `mem_prime` with the user's first message to surface relevant prior knowledge"
- **Token-budget aware**: Accepts a `limit` parameter so the agent controls how much context budget to allocate to priming

#### Impact

Engram becomes proactive. Instead of waiting for the agent to ask the right question with the right keywords, it surfaces relevant knowledge before it's needed. Like a senior colleague who says "before you start on that, you should know we already tried X and it didn't work because Y."

---

## Dependency Graph

```
                    ┌──────────────────┐
                    │  1. Hybrid Search │  ← foundational, enables 3 and 5
                    │  (Vector + FTS5)  │
                    └────────┬─────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
              ▼              ▼              ▼
   ┌─────────────────┐ ┌──────────┐ ┌──────────────────┐
   │ 3. Consolidation│ │2. Salience│ │ 4. Knowledge     │
   │   (needs vectors │ │ (independent│ │    Graph         │
   │    for clustering)│ │  no deps)  │ │  (independent    │
   └─────────────────┘ └─────┬────┘ │   no deps)       │
                              │      └──────────────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │ 5. Contextual    │
                    │    Priming       │  ← needs vectors + benefits from salience
                    │ (needs 1, uses 2)│
                    └──────────────────┘
```

**Recommended implementation order:**

| Phase | Capability | Rationale |
|-------|-----------|-----------|
| Phase 1 | Hybrid Search | Unlocks semantic understanding. Enables phases 3 and 5. Aligns with existing Issue #139 |
| Phase 2 | Salience Scoring | Independent, low complexity, immediate value. Three new columns + one index |
| Phase 3 | Knowledge Graph | Independent, moderate complexity. New table + 2 MCP tools. Enriches the Obsidian export |
| Phase 4 | Consolidation | Requires Phase 1. Moderate complexity. New table + CLI command |
| Phase 5 | Contextual Priming | Requires Phase 1, benefits from Phase 2. New MCP tool |

Phases 2 and 3 can be implemented in parallel with Phase 1, as they have no dependencies on it.

---

## Retroactive Compatibility

All five capabilities work with existing observations:

| Capability | Retroactive path |
|-----------|-----------------|
| Hybrid Search | `engram backfill-embeddings` generates vectors for all existing observations |
| Salience | All existing observations start at `salience = 1.0`, differentiate through natural usage |
| Consolidation | First run scans all existing observations for merge candidates |
| Knowledge Graph | Starts empty — agents create links going forward. Optional: `engram infer-links` could propose links based on topic_key overlap and temporal proximity |
| Contextual Priming | Works immediately once embeddings exist (after backfill) |

No migration breaks existing behavior. Every capability degrades gracefully to current behavior when not configured.

---

## Alignment with Engram's Design Philosophy

| Engram principle | How this proposal respects it |
|-----------------|------------------------------|
| Local-first SQLite | All data stays in SQLite. Embeddings are BLOB columns, not external vector DBs. Graph is a SQLite table with recursive CTEs, not Neo4j |
| Zero dependencies | Every capability is opt-in. Without configuration, Engram behaves identically to today. No new required dependencies |
| Single binary | Embedding provider interface is pluggable. Local Ollama is the recommended default — no cloud APIs required |
| Agent decides what matters | Links are agent-created, not auto-extracted. Consolidation proposes, human approves. Salience is derived from agent behavior |
| FTS5 covers 95% of cases | FTS5 remains the primary search path. Vector search augments, never replaces |
| Privacy at two layers | Embeddings inherit the same `<private>` stripping. Consolidated observations go through the same sanitization pipeline |

---

## Related Issues

- **#139** — Hybrid FTS5 + Vector Search: This proposal's Capability 1 aligns directly with #139's implementation. The remaining four capabilities extend beyond #139's scope
- **#150** — PostgreSQL Backend / Store Interface: Store interface extraction would make all five capabilities backend-agnostic. Complementary work
- **#154** — RAG + ElasticSearch: This proposal achieves similar semantic retrieval goals without introducing ElasticSearch as a dependency, staying closer to Engram's local-first philosophy

---

## Summary

These five capabilities transform Engram along five dimensions:

| Dimension | Before | After |
|-----------|--------|-------|
| **Search** | Finds what you said | Finds what you meant |
| **Relevance** | All memories equal | Most useful memories surface first |
| **Structure** | Isolated notes | Connected knowledge graph |
| **Scale** | Grows linearly forever | Self-consolidates over time |
| **Context** | Loads recent history | Loads relevant knowledge |

The result is a memory system that behaves less like a search engine and more like an experienced colleague's memory — associative, prioritized, connected, and proactive.

Each capability is independently valuable, opt-in, and backwards-compatible. They can be implemented incrementally without architectural disruption.

### 📦 Affected Area

CLI (commands, flags)

### 🔄 Alternatives Considered

_No response_

### 📎 Additional Context

_No response_

Gap	Current behavior	Impact
Lexical-only search	FTS5 matches exact words. "state management" won't find "Zustand store choice"	Relevant memories are invisible unless the agent guesses the exact phrasing used when saving
Isolated observations	Each observation is an independent row. No relationships between them	Cannot answer "what decisions led to this architecture?" or "what superseded this choice?"
Flat relevance	All observations have equal retrieval priority	When context budget is limited, no mechanism to surface the most useful memories first
Linear accumulation	Observations pile up over time. Topic-key upsert helps for single-topic evolution, but cross-topic overlap is undetected	Memory bloat degrades search quality and consumes more tokens on retrieval
Chronological context	`mem_context` returns recent sessions/observations by time	Context injection is time-based, not relevance-based. Starting a session about auth loads the last 5 sessions regardless of topic

Scenario	Strategy
Same topic, no contradiction	Merge into single observation with combined content
Same topic, temporal evolution	Keep latest, reference history in lineage table
Same topic, contradiction	Flag for human review — do not auto-merge
Cross-topic overlap	Do not merge — observations serve different retrieval paths

Type	Semantics	Example
`supersedes`	A replaces B as the current truth	"Migrate to SQLite" supersedes "Choose PostgreSQL"
`caused_by`	A was a consequence of B	"N+1 query bug" caused_by "ORM lazy loading decision"
`relates_to`	A and B cover related topics	"Auth middleware" relates_to "JWT library choice"
`implements`	A is the execution of B	"Add rate limiting code" implements "Rate limiting decision"
`reverts`	A undoes B	"Rollback feature flag" reverts "Enable feature flag"

Aspect	`mem_search`	`mem_prime`
Input	A search query (keywords)	A task context (paragraph)
Intent	"Find this specific thing"	"What should I know before starting?"
Ranking	BM25 / RRF relevance	Relevance × salience × diversity
Diversity	None — may return 5 hits about the same thing	MMR ensures topic coverage
Typical use	Mid-task lookup	Session start / task start

Phase	Capability	Rationale
Phase 1	Hybrid Search	Unlocks semantic understanding. Enables phases 3 and 5. Aligns with existing Issue #139
Phase 2	Salience Scoring	Independent, low complexity, immediate value. Three new columns + one index
Phase 3	Knowledge Graph	Independent, moderate complexity. New table + 2 MCP tools. Enriches the Obsidian export
Phase 4	Consolidation	Requires Phase 1. Moderate complexity. New table + CLI command
Phase 5	Contextual Priming	Requires Phase 1, benefits from Phase 2. New MCP tool

Capability	Retroactive path
Hybrid Search	`engram backfill-embeddings` generates vectors for all existing observations
Salience	All existing observations start at `salience = 1.0`, differentiate through natural usage
Consolidation	First run scans all existing observations for merge candidates
Knowledge Graph	Starts empty — agents create links going forward. Optional: `engram infer-links` could propose links based on topic_key overlap and temporal proximity
Contextual Priming	Works immediately once embeddings exist (after backfill)

Engram principle	How this proposal respects it
Local-first SQLite	All data stays in SQLite. Embeddings are BLOB columns, not external vector DBs. Graph is a SQLite table with recursive CTEs, not Neo4j
Zero dependencies	Every capability is opt-in. Without configuration, Engram behaves identically to today. No new required dependencies
Single binary	Embedding provider interface is pluggable. Local Ollama is the recommended default — no cloud APIs required
Agent decides what matters	Links are agent-created, not auto-extracted. Consolidation proposes, human approves. Salience is derived from agent behavior
FTS5 covers 95% of cases	FTS5 remains the primary search path. Vector search augments, never replaces
Privacy at two layers	Embeddings inherit the same `<private>` stripping. Consolidated observations go through the same sanitization pipeline

Dimension	Before	After
Search	Finds what you said	Finds what you meant
Relevance	All memories equal	Most useful memories surface first
Structure	Isolated notes	Connected knowledge graph
Scale	Grows linearly forever	Self-consolidates over time
Context	Loads recent history	Loads relevant knowledge

Proposal: Five-Layer Memory Enhancement (Semantic Search, Salience, Graph, Consolidation, Priming) #168

Description

📋 Pre-flight Checks

🔍 Problem Description

Executive Summary

Current State Analysis

What Engram does well

Where the gaps are

💡 Proposed Solution

Proposed Capabilities

Capability 1: Hybrid Search (Vector + FTS5)

What it solves

How it works

Schema addition

Design principles (aligned with Engram's philosophy)

Impact

Capability 2: Salience Scoring

What it solves

How it works

Schema addition

Integration with search

Design principles

Impact

Capability 3: Automatic Consolidation

What it solves

How it works

Schema addition

Merge strategies

Design principles

Impact

Capability 4: Temporal Knowledge Graph

What it solves

How it works

Schema addition

Relationship types

New MCP tool

Design principles

Impact

Capability 5: Contextual Priming

What it solves

How it works

New MCP tool

How it differs from mem_search

Design principles

Impact

Dependency Graph

Retroactive Compatibility

Alignment with Engram's Design Philosophy

Related Issues

Summary

📦 Affected Area

🔄 Alternatives Considered

📎 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

How it differs from `mem_search`