📋 Pre-flight Checks
🔍 Problem Description
Executive Summary
Engram solves the hardest problem in AI-assisted development: persistent context across sessions. Its local-first SQLite architecture, FTS5 search, and zero-dependency binary make it the most practical memory system available for coding agents today.
This proposal identifies five capabilities that would elevate Engram from a persistent note system to an associative, temporal, self-organizing knowledge layer—without compromising its core design principles. Each capability is opt-in, preserves the zero-dependency default, and builds on the existing schema rather than replacing it.
The five capabilities form a dependency chain. They are presented in implementation order.
Current State Analysis
What Engram does well
- Persistence: Observations survive across sessions and context compactions
- Topic-key upsert: Evolving knowledge updates in place instead of duplicating
- Deduplication: Content-hash + time-window prevents redundant writes
- FTS5 search: Fast keyword-based retrieval with BM25 ranking
- Privacy:
<private> tag stripping at the store layer
- Zero dependencies: Single Go binary, pure SQLite, no CGO, no runtime
Where the gaps are
| Gap |
Current behavior |
Impact |
| Lexical-only search |
FTS5 matches exact words. "state management" won't find "Zustand store choice" |
Relevant memories are invisible unless the agent guesses the exact phrasing used when saving |
| Isolated observations |
Each observation is an independent row. No relationships between them |
Cannot answer "what decisions led to this architecture?" or "what superseded this choice?" |
| Flat relevance |
All observations have equal retrieval priority |
When context budget is limited, no mechanism to surface the most useful memories first |
| Linear accumulation |
Observations pile up over time. Topic-key upsert helps for single-topic evolution, but cross-topic overlap is undetected |
Memory bloat degrades search quality and consumes more tokens on retrieval |
| Chronological context |
mem_context returns recent sessions/observations by time |
Context injection is time-based, not relevance-based. Starting a session about auth loads the last 5 sessions regardless of topic |
💡 Proposed Solution
Proposed Capabilities
Capability 1: Hybrid Search (Vector + FTS5)
Note: Issue #139 by @jtomaszon proposes a comprehensive implementation of this capability with pluggable providers, an observation_embeddings table, and Reciprocal Rank Fusion (RRF). This proposal endorses that approach and frames it within the broader enhancement roadmap. The implementation details below are compatible with #139's design.
What it solves
FTS5 is lexical — it matches words, not meaning. When an agent saves "we chose Zustand for client-side state management" and later searches "how do we handle frontend reactivity", FTS5 returns nothing. The concepts are semantically linked but share zero keywords.
How it works
Each observation gets an embedding vector generated at save time. Search runs BOTH FTS5 (keyword match) and vector similarity (semantic match) in parallel, then merges results using Reciprocal Rank Fusion (RRF):
score(doc) = 1/(k + rank_fts5) + 1/(k + rank_vector)
Where k is a tuning constant (typically 60) that controls how much weight goes to top-ranked results.
Schema addition
CREATE TABLE observation_embeddings (
observation_id INTEGER PRIMARY KEY REFERENCES observations(id),
provider TEXT NOT NULL, -- 'ollama', 'openai', 'local'
model TEXT NOT NULL, -- 'nomic-embed-text', 'text-embedding-3-small'
dimensions INTEGER NOT NULL,
embedding BLOB NOT NULL, -- float32 array
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
Design principles (aligned with Engram's philosophy)
- Opt-in: Embeddings are generated only if an embedding provider is configured. Without configuration, Engram behaves exactly as today — pure FTS5
- Provider-agnostic: Pluggable interface supporting local models (Ollama), cloud APIs (OpenAI), or future sqlite-vss integration
- Local-first preferred: Default recommendation is Ollama with
nomic-embed-text — runs locally, no API keys, no data leaves the machine
- Async generation: Embedding computation happens asynchronously after save confirmation. The agent is never blocked waiting for embeddings
- Backfill CLI:
engram backfill-embeddings processes all existing observations that lack embeddings. Fully retroactive
- Graceful degradation: If the embedding provider is unavailable (Ollama not running, API key expired), save succeeds normally — the observation just won't have an embedding until next backfill
Impact
Engram goes from "find what I said" to "find what I meant." This is the foundational capability that enables Capabilities 3 and 5.
Capability 2: Salience Scoring
What it solves
All observations have equal retrieval weight. When an agent's context budget allows loading 10 memories, there's no way to determine which 10 are most valuable. Time-based ordering is a weak proxy — a critical architecture decision from 3 months ago is more important than yesterday's typo fix.
How it works
Each observation gets a salience score that reflects its real-world importance based on actual usage patterns:
- Access boost: Every
mem_get_observation or search result click increments salience by a configurable delta (default: +0.1)
- Time decay: A background process (or on-read lazy evaluation) applies exponential decay:
salience = salience * decay_factor ^ days_since_last_access
- Type weighting: Observations of type
architecture or decision start with higher base salience than bugfix or discovery, reflecting their typical long-term value
- Revision signal: Higher
revision_count (already tracked) correlates with actively maintained knowledge — factor it into salience
Schema addition
ALTER TABLE observations ADD COLUMN salience REAL NOT NULL DEFAULT 1.0;
ALTER TABLE observations ADD COLUMN access_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE observations ADD COLUMN last_accessed_at TEXT;
CREATE INDEX idx_obs_salience ON observations(salience DESC);
Integration with search
The merged search score becomes:
final_score(doc) = retrieval_score(doc) * salience_weight(doc)
Where retrieval_score comes from FTS5 (or RRF if embeddings are enabled), and salience_weight is a normalized salience value.
Design principles
- Zero configuration required: Works out of the box with sensible defaults
- Observable:
mem_stats exposes salience distribution (min, max, median, P90) so users can tune decay parameters
- No data loss: Salience affects ranking, never deletion. Low-salience observations are deprioritized, not removed
- Retroactive: Existing observations start at 1.0 and begin differentiating immediately through natural usage
Impact
Engram learns which memories matter based on behavior, not declarations. The most accessed and actively maintained knowledge surfaces first.
Capability 3: Automatic Consolidation
Depends on: Capability 1 (Hybrid Search) for semantic similarity detection
What it solves
Over months of use, an agent accumulates hundreds of observations with significant overlap. A project might have 15 observations about authentication — the initial choice, three iterations, two bug fixes, a migration decision. The relevant knowledge is scattered across all of them. Loading all 15 wastes tokens; loading any one is incomplete.
How it works
A consolidation process (triggered manually via CLI or automatically at configurable intervals) identifies clusters of semantically similar observations and proposes merges:
- Cluster detection: Group observations by semantic similarity (cosine similarity > configurable threshold, e.g., 0.82) within the same project and scope
- Conflict detection: Within each cluster, identify contradictions (e.g., "we chose PostgreSQL" vs. "we migrated to SQLite"). Contradictions are flagged, not auto-resolved
- Merge proposal: For non-conflicting clusters, generate a consolidated observation that preserves all unique information from the cluster members
- Lineage tracking: Merged observations store references to their source observation IDs. The originals are soft-deleted (
deleted_at set) but remain queryable for audit
Schema addition
CREATE TABLE consolidation_lineage (
consolidated_id INTEGER NOT NULL REFERENCES observations(id),
source_id INTEGER NOT NULL REFERENCES observations(id),
created_at TEXT NOT NULL DEFAULT (datetime('now')),
PRIMARY KEY (consolidated_id, source_id)
);
Merge strategies
| Scenario |
Strategy |
| Same topic, no contradiction |
Merge into single observation with combined content |
| Same topic, temporal evolution |
Keep latest, reference history in lineage table |
| Same topic, contradiction |
Flag for human review — do not auto-merge |
| Cross-topic overlap |
Do not merge — observations serve different retrieval paths |
Design principles
- Human-in-the-loop by default: First implementation should be
engram consolidate --dry-run that shows proposed merges. Auto-merge is a future opt-in
- Reversible: Source observations are soft-deleted, not destroyed.
engram consolidate --undo <consolidation_id> restores them
- Conservative threshold: Default similarity threshold should be high (0.85+) to avoid false merges. Better to under-merge than over-merge
- Respects topic_key boundaries: Observations with different topic_keys are never merged, even if semantically similar — the agent assigned different keys for a reason
Impact
Engram stays lean over time. Instead of 500 observations with 60% overlap, you get 200 dense, comprehensive observations. More knowledge in fewer tokens.
Capability 4: Temporal Knowledge Graph
What it solves
Observations exist in isolation. There's no way to express or query relationships like:
- "This SQLite decision superseded the PostgreSQL decision"
- "This bug was caused by that architecture choice"
- "These three observations relate to the auth module"
Without relationships, the agent can't navigate knowledge — it can only search for it.
How it works
A lightweight relationship layer on top of existing observations. Relationships are typed, directed, and temporally scoped:
[Observation A] --relationship_type--> [Observation B]
valid_from: 2026-03-01
valid_to: null (still active)
Schema addition
CREATE TABLE observation_links (
id INTEGER PRIMARY KEY AUTOINCREMENT,
from_id INTEGER NOT NULL REFERENCES observations(id),
to_id INTEGER NOT NULL REFERENCES observations(id),
relationship TEXT NOT NULL, -- 'supersedes', 'caused_by', 'relates_to', 'implements', 'reverts'
valid_from TEXT NOT NULL DEFAULT (datetime('now')),
valid_to TEXT, -- NULL = still active
created_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(from_id, to_id, relationship)
);
CREATE INDEX idx_links_from ON observation_links(from_id, relationship);
CREATE INDEX idx_links_to ON observation_links(to_id, relationship);
CREATE INDEX idx_links_validity ON observation_links(valid_from, valid_to);
Relationship types
| Type |
Semantics |
Example |
supersedes |
A replaces B as the current truth |
"Migrate to SQLite" supersedes "Choose PostgreSQL" |
caused_by |
A was a consequence of B |
"N+1 query bug" caused_by "ORM lazy loading decision" |
relates_to |
A and B cover related topics |
"Auth middleware" relates_to "JWT library choice" |
implements |
A is the execution of B |
"Add rate limiting code" implements "Rate limiting decision" |
reverts |
A undoes B |
"Rollback feature flag" reverts "Enable feature flag" |
New MCP tool
mem_link(from_id, to_id, relationship, valid_from?, valid_to?)
mem_traverse(observation_id, relationship?, direction?, depth?)
mem_traverse returns connected observations up to N hops away, optionally filtered by relationship type. This enables queries like "show me everything that led to this decision" or "what did this architecture choice cause?"
Design principles
- Agent-driven linking: The agent creates links explicitly during
mem_save or as a separate operation. No automatic entity extraction in v1 — that's complex, error-prone, and adds NLP dependencies
- Temporal validity: Links have
valid_from / valid_to so the graph reflects what's CURRENT, not just what was ever true
- Lightweight traversal: Depth-limited BFS on a small graph (hundreds to low thousands of nodes). No need for a graph database — SQLite handles this with recursive CTEs
- Existing Obsidian export synergy: The Obsidian Brain export already generates a graph visualization. Links would make that graph actually meaningful instead of just proximity-based
Impact
Engram becomes navigable. Instead of searching in the dark, the agent can follow relationship chains to build a complete picture of how knowledge connects. "What decisions led to the current auth system?" becomes a traversal query, not a prayer to FTS5.
Capability 5: Contextual Priming
Depends on: Capability 1 (Hybrid Search), enhanced by Capability 2 (Salience Scoring)
What it solves
mem_context returns recent sessions and observations chronologically. If an agent starts a session about database optimization, it gets the last 5 sessions — which might be about CSS, auth, and deployment. The context injection is time-based, not relevance-based.
How it works
A new tool mem_prime (or an enhanced mem_context mode) takes the current task context as input and returns the most relevant observations regardless of recency:
- Input: The agent passes its current context (user's first message, current file, active task description)
- Embedding: The input is embedded using the same provider as observations
- Retrieval: Vector similarity search against all observation embeddings, weighted by salience score
- Diversification: Results are diversified to avoid returning 5 observations about the same sub-topic. MMR (Maximal Marginal Relevance) ensures breadth
- Output: Top-N most relevant observations, ordered by combined relevance + salience score
New MCP tool
mem_prime(context, project?, scope?, limit?)
context: Free-text description of what the agent is about to work on
- Returns: Ranked list of observations most relevant to that context
How it differs from mem_search
| Aspect |
mem_search |
mem_prime |
| Input |
A search query (keywords) |
A task context (paragraph) |
| Intent |
"Find this specific thing" |
"What should I know before starting?" |
| Ranking |
BM25 / RRF relevance |
Relevance × salience × diversity |
| Diversity |
None — may return 5 hits about the same thing |
MMR ensures topic coverage |
| Typical use |
Mid-task lookup |
Session start / task start |
Design principles
- Complementary, not replacing:
mem_context (chronological) and mem_search (keyword) remain unchanged. mem_prime is a new retrieval mode for a different use case
- Requires embeddings: Only available when an embedding provider is configured. Without embeddings, agents continue using
mem_context and mem_search as today
- Agent protocol integration: The CLAUDE.md protocol currently instructs agents to call
mem_context at session start. With mem_prime, the protocol would add: "After mem_context, call mem_prime with the user's first message to surface relevant prior knowledge"
- Token-budget aware: Accepts a
limit parameter so the agent controls how much context budget to allocate to priming
Impact
Engram becomes proactive. Instead of waiting for the agent to ask the right question with the right keywords, it surfaces relevant knowledge before it's needed. Like a senior colleague who says "before you start on that, you should know we already tried X and it didn't work because Y."
Dependency Graph
┌──────────────────┐
│ 1. Hybrid Search │ ← foundational, enables 3 and 5
│ (Vector + FTS5) │
└────────┬─────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────┐ ┌──────────────────┐
│ 3. Consolidation│ │2. Salience│ │ 4. Knowledge │
│ (needs vectors │ │ (independent│ │ Graph │
│ for clustering)│ │ no deps) │ │ (independent │
└─────────────────┘ └─────┬────┘ │ no deps) │
│ └──────────────────┘
│
▼
┌──────────────────┐
│ 5. Contextual │
│ Priming │ ← needs vectors + benefits from salience
│ (needs 1, uses 2)│
└──────────────────┘
Recommended implementation order:
| Phase |
Capability |
Rationale |
| Phase 1 |
Hybrid Search |
Unlocks semantic understanding. Enables phases 3 and 5. Aligns with existing Issue #139 |
| Phase 2 |
Salience Scoring |
Independent, low complexity, immediate value. Three new columns + one index |
| Phase 3 |
Knowledge Graph |
Independent, moderate complexity. New table + 2 MCP tools. Enriches the Obsidian export |
| Phase 4 |
Consolidation |
Requires Phase 1. Moderate complexity. New table + CLI command |
| Phase 5 |
Contextual Priming |
Requires Phase 1, benefits from Phase 2. New MCP tool |
Phases 2 and 3 can be implemented in parallel with Phase 1, as they have no dependencies on it.
Retroactive Compatibility
All five capabilities work with existing observations:
| Capability |
Retroactive path |
| Hybrid Search |
engram backfill-embeddings generates vectors for all existing observations |
| Salience |
All existing observations start at salience = 1.0, differentiate through natural usage |
| Consolidation |
First run scans all existing observations for merge candidates |
| Knowledge Graph |
Starts empty — agents create links going forward. Optional: engram infer-links could propose links based on topic_key overlap and temporal proximity |
| Contextual Priming |
Works immediately once embeddings exist (after backfill) |
No migration breaks existing behavior. Every capability degrades gracefully to current behavior when not configured.
Alignment with Engram's Design Philosophy
| Engram principle |
How this proposal respects it |
| Local-first SQLite |
All data stays in SQLite. Embeddings are BLOB columns, not external vector DBs. Graph is a SQLite table with recursive CTEs, not Neo4j |
| Zero dependencies |
Every capability is opt-in. Without configuration, Engram behaves identically to today. No new required dependencies |
| Single binary |
Embedding provider interface is pluggable. Local Ollama is the recommended default — no cloud APIs required |
| Agent decides what matters |
Links are agent-created, not auto-extracted. Consolidation proposes, human approves. Salience is derived from agent behavior |
| FTS5 covers 95% of cases |
FTS5 remains the primary search path. Vector search augments, never replaces |
| Privacy at two layers |
Embeddings inherit the same <private> stripping. Consolidated observations go through the same sanitization pipeline |
Related Issues
Summary
These five capabilities transform Engram along five dimensions:
| Dimension |
Before |
After |
| Search |
Finds what you said |
Finds what you meant |
| Relevance |
All memories equal |
Most useful memories surface first |
| Structure |
Isolated notes |
Connected knowledge graph |
| Scale |
Grows linearly forever |
Self-consolidates over time |
| Context |
Loads recent history |
Loads relevant knowledge |
The result is a memory system that behaves less like a search engine and more like an experienced colleague's memory — associative, prioritized, connected, and proactive.
Each capability is independently valuable, opt-in, and backwards-compatible. They can be implemented incrementally without architectural disruption.
📦 Affected Area
CLI (commands, flags)
🔄 Alternatives Considered
No response
📎 Additional Context
No response
📋 Pre-flight Checks
status:approvedbefore a PR can be opened🔍 Problem Description
Executive Summary
Engram solves the hardest problem in AI-assisted development: persistent context across sessions. Its local-first SQLite architecture, FTS5 search, and zero-dependency binary make it the most practical memory system available for coding agents today.
This proposal identifies five capabilities that would elevate Engram from a persistent note system to an associative, temporal, self-organizing knowledge layer—without compromising its core design principles. Each capability is opt-in, preserves the zero-dependency default, and builds on the existing schema rather than replacing it.
The five capabilities form a dependency chain. They are presented in implementation order.
Current State Analysis
What Engram does well
<private>tag stripping at the store layerWhere the gaps are
mem_contextreturns recent sessions/observations by time💡 Proposed Solution
Proposed Capabilities
Capability 1: Hybrid Search (Vector + FTS5)
What it solves
FTS5 is lexical — it matches words, not meaning. When an agent saves "we chose Zustand for client-side state management" and later searches "how do we handle frontend reactivity", FTS5 returns nothing. The concepts are semantically linked but share zero keywords.
How it works
Each observation gets an embedding vector generated at save time. Search runs BOTH FTS5 (keyword match) and vector similarity (semantic match) in parallel, then merges results using Reciprocal Rank Fusion (RRF):
Where
kis a tuning constant (typically 60) that controls how much weight goes to top-ranked results.Schema addition
Design principles (aligned with Engram's philosophy)
nomic-embed-text— runs locally, no API keys, no data leaves the machineengram backfill-embeddingsprocesses all existing observations that lack embeddings. Fully retroactiveImpact
Engram goes from "find what I said" to "find what I meant." This is the foundational capability that enables Capabilities 3 and 5.
Capability 2: Salience Scoring
What it solves
All observations have equal retrieval weight. When an agent's context budget allows loading 10 memories, there's no way to determine which 10 are most valuable. Time-based ordering is a weak proxy — a critical architecture decision from 3 months ago is more important than yesterday's typo fix.
How it works
Each observation gets a
saliencescore that reflects its real-world importance based on actual usage patterns:mem_get_observationor search result click increments salience by a configurable delta (default: +0.1)salience = salience * decay_factor ^ days_since_last_accessarchitectureordecisionstart with higher base salience thanbugfixordiscovery, reflecting their typical long-term valuerevision_count(already tracked) correlates with actively maintained knowledge — factor it into salienceSchema addition
Integration with search
The merged search score becomes:
Where
retrieval_scorecomes from FTS5 (or RRF if embeddings are enabled), andsalience_weightis a normalized salience value.Design principles
mem_statsexposes salience distribution (min, max, median, P90) so users can tune decay parametersImpact
Engram learns which memories matter based on behavior, not declarations. The most accessed and actively maintained knowledge surfaces first.
Capability 3: Automatic Consolidation
What it solves
Over months of use, an agent accumulates hundreds of observations with significant overlap. A project might have 15 observations about authentication — the initial choice, three iterations, two bug fixes, a migration decision. The relevant knowledge is scattered across all of them. Loading all 15 wastes tokens; loading any one is incomplete.
How it works
A consolidation process (triggered manually via CLI or automatically at configurable intervals) identifies clusters of semantically similar observations and proposes merges:
deleted_atset) but remain queryable for auditSchema addition
Merge strategies
Design principles
engram consolidate --dry-runthat shows proposed merges. Auto-merge is a future opt-inengram consolidate --undo <consolidation_id>restores themImpact
Engram stays lean over time. Instead of 500 observations with 60% overlap, you get 200 dense, comprehensive observations. More knowledge in fewer tokens.
Capability 4: Temporal Knowledge Graph
What it solves
Observations exist in isolation. There's no way to express or query relationships like:
Without relationships, the agent can't navigate knowledge — it can only search for it.
How it works
A lightweight relationship layer on top of existing observations. Relationships are typed, directed, and temporally scoped:
Schema addition
Relationship types
supersedescaused_byrelates_toimplementsrevertsNew MCP tool
mem_traversereturns connected observations up to N hops away, optionally filtered by relationship type. This enables queries like "show me everything that led to this decision" or "what did this architecture choice cause?"Design principles
mem_saveor as a separate operation. No automatic entity extraction in v1 — that's complex, error-prone, and adds NLP dependenciesvalid_from/valid_toso the graph reflects what's CURRENT, not just what was ever trueImpact
Engram becomes navigable. Instead of searching in the dark, the agent can follow relationship chains to build a complete picture of how knowledge connects. "What decisions led to the current auth system?" becomes a traversal query, not a prayer to FTS5.
Capability 5: Contextual Priming
What it solves
mem_contextreturns recent sessions and observations chronologically. If an agent starts a session about database optimization, it gets the last 5 sessions — which might be about CSS, auth, and deployment. The context injection is time-based, not relevance-based.How it works
A new tool
mem_prime(or an enhancedmem_contextmode) takes the current task context as input and returns the most relevant observations regardless of recency:New MCP tool
context: Free-text description of what the agent is about to work onHow it differs from
mem_searchmem_searchmem_primeDesign principles
mem_context(chronological) andmem_search(keyword) remain unchanged.mem_primeis a new retrieval mode for a different use casemem_contextandmem_searchas todaymem_contextat session start. Withmem_prime, the protocol would add: "Aftermem_context, callmem_primewith the user's first message to surface relevant prior knowledge"limitparameter so the agent controls how much context budget to allocate to primingImpact
Engram becomes proactive. Instead of waiting for the agent to ask the right question with the right keywords, it surfaces relevant knowledge before it's needed. Like a senior colleague who says "before you start on that, you should know we already tried X and it didn't work because Y."
Dependency Graph
Recommended implementation order:
Phases 2 and 3 can be implemented in parallel with Phase 1, as they have no dependencies on it.
Retroactive Compatibility
All five capabilities work with existing observations:
engram backfill-embeddingsgenerates vectors for all existing observationssalience = 1.0, differentiate through natural usageengram infer-linkscould propose links based on topic_key overlap and temporal proximityNo migration breaks existing behavior. Every capability degrades gracefully to current behavior when not configured.
Alignment with Engram's Design Philosophy
<private>stripping. Consolidated observations go through the same sanitization pipelineRelated Issues
Summary
These five capabilities transform Engram along five dimensions:
The result is a memory system that behaves less like a search engine and more like an experienced colleague's memory — associative, prioritized, connected, and proactive.
Each capability is independently valuable, opt-in, and backwards-compatible. They can be implemented incrementally without architectural disruption.
📦 Affected Area
CLI (commands, flags)
🔄 Alternatives Considered
No response
📎 Additional Context
No response