Skip to content

Proposal: Five-Layer Memory Enhancement (Semantic Search, Salience, Graph, Consolidation, Priming) #168

@blkdooGit

Description

@blkdooGit

📋 Pre-flight Checks

  • I have searched existing issues and this is not a duplicate
  • I understand this issue needs status:approved before a PR can be opened

🔍 Problem Description

Executive Summary

Engram solves the hardest problem in AI-assisted development: persistent context across sessions. Its local-first SQLite architecture, FTS5 search, and zero-dependency binary make it the most practical memory system available for coding agents today.

This proposal identifies five capabilities that would elevate Engram from a persistent note system to an associative, temporal, self-organizing knowledge layer—without compromising its core design principles. Each capability is opt-in, preserves the zero-dependency default, and builds on the existing schema rather than replacing it.

The five capabilities form a dependency chain. They are presented in implementation order.


Current State Analysis

What Engram does well

  • Persistence: Observations survive across sessions and context compactions
  • Topic-key upsert: Evolving knowledge updates in place instead of duplicating
  • Deduplication: Content-hash + time-window prevents redundant writes
  • FTS5 search: Fast keyword-based retrieval with BM25 ranking
  • Privacy: <private> tag stripping at the store layer
  • Zero dependencies: Single Go binary, pure SQLite, no CGO, no runtime

Where the gaps are

Gap Current behavior Impact
Lexical-only search FTS5 matches exact words. "state management" won't find "Zustand store choice" Relevant memories are invisible unless the agent guesses the exact phrasing used when saving
Isolated observations Each observation is an independent row. No relationships between them Cannot answer "what decisions led to this architecture?" or "what superseded this choice?"
Flat relevance All observations have equal retrieval priority When context budget is limited, no mechanism to surface the most useful memories first
Linear accumulation Observations pile up over time. Topic-key upsert helps for single-topic evolution, but cross-topic overlap is undetected Memory bloat degrades search quality and consumes more tokens on retrieval
Chronological context mem_context returns recent sessions/observations by time Context injection is time-based, not relevance-based. Starting a session about auth loads the last 5 sessions regardless of topic

💡 Proposed Solution

Proposed Capabilities

Capability 1: Hybrid Search (Vector + FTS5)

Note: Issue #139 by @jtomaszon proposes a comprehensive implementation of this capability with pluggable providers, an observation_embeddings table, and Reciprocal Rank Fusion (RRF). This proposal endorses that approach and frames it within the broader enhancement roadmap. The implementation details below are compatible with #139's design.

What it solves

FTS5 is lexical — it matches words, not meaning. When an agent saves "we chose Zustand for client-side state management" and later searches "how do we handle frontend reactivity", FTS5 returns nothing. The concepts are semantically linked but share zero keywords.

How it works

Each observation gets an embedding vector generated at save time. Search runs BOTH FTS5 (keyword match) and vector similarity (semantic match) in parallel, then merges results using Reciprocal Rank Fusion (RRF):

score(doc) = 1/(k + rank_fts5) + 1/(k + rank_vector)

Where k is a tuning constant (typically 60) that controls how much weight goes to top-ranked results.

Schema addition

CREATE TABLE observation_embeddings (
    observation_id INTEGER PRIMARY KEY REFERENCES observations(id),
    provider       TEXT NOT NULL,           -- 'ollama', 'openai', 'local'
    model          TEXT NOT NULL,           -- 'nomic-embed-text', 'text-embedding-3-small'
    dimensions     INTEGER NOT NULL,
    embedding      BLOB NOT NULL,           -- float32 array
    created_at     TEXT NOT NULL DEFAULT (datetime('now'))
);

Design principles (aligned with Engram's philosophy)

  • Opt-in: Embeddings are generated only if an embedding provider is configured. Without configuration, Engram behaves exactly as today — pure FTS5
  • Provider-agnostic: Pluggable interface supporting local models (Ollama), cloud APIs (OpenAI), or future sqlite-vss integration
  • Local-first preferred: Default recommendation is Ollama with nomic-embed-text — runs locally, no API keys, no data leaves the machine
  • Async generation: Embedding computation happens asynchronously after save confirmation. The agent is never blocked waiting for embeddings
  • Backfill CLI: engram backfill-embeddings processes all existing observations that lack embeddings. Fully retroactive
  • Graceful degradation: If the embedding provider is unavailable (Ollama not running, API key expired), save succeeds normally — the observation just won't have an embedding until next backfill

Impact

Engram goes from "find what I said" to "find what I meant." This is the foundational capability that enables Capabilities 3 and 5.


Capability 2: Salience Scoring

What it solves

All observations have equal retrieval weight. When an agent's context budget allows loading 10 memories, there's no way to determine which 10 are most valuable. Time-based ordering is a weak proxy — a critical architecture decision from 3 months ago is more important than yesterday's typo fix.

How it works

Each observation gets a salience score that reflects its real-world importance based on actual usage patterns:

  • Access boost: Every mem_get_observation or search result click increments salience by a configurable delta (default: +0.1)
  • Time decay: A background process (or on-read lazy evaluation) applies exponential decay: salience = salience * decay_factor ^ days_since_last_access
  • Type weighting: Observations of type architecture or decision start with higher base salience than bugfix or discovery, reflecting their typical long-term value
  • Revision signal: Higher revision_count (already tracked) correlates with actively maintained knowledge — factor it into salience

Schema addition

ALTER TABLE observations ADD COLUMN salience REAL NOT NULL DEFAULT 1.0;
ALTER TABLE observations ADD COLUMN access_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE observations ADD COLUMN last_accessed_at TEXT;

CREATE INDEX idx_obs_salience ON observations(salience DESC);

Integration with search

The merged search score becomes:

final_score(doc) = retrieval_score(doc) * salience_weight(doc)

Where retrieval_score comes from FTS5 (or RRF if embeddings are enabled), and salience_weight is a normalized salience value.

Design principles

  • Zero configuration required: Works out of the box with sensible defaults
  • Observable: mem_stats exposes salience distribution (min, max, median, P90) so users can tune decay parameters
  • No data loss: Salience affects ranking, never deletion. Low-salience observations are deprioritized, not removed
  • Retroactive: Existing observations start at 1.0 and begin differentiating immediately through natural usage

Impact

Engram learns which memories matter based on behavior, not declarations. The most accessed and actively maintained knowledge surfaces first.


Capability 3: Automatic Consolidation

Depends on: Capability 1 (Hybrid Search) for semantic similarity detection

What it solves

Over months of use, an agent accumulates hundreds of observations with significant overlap. A project might have 15 observations about authentication — the initial choice, three iterations, two bug fixes, a migration decision. The relevant knowledge is scattered across all of them. Loading all 15 wastes tokens; loading any one is incomplete.

How it works

A consolidation process (triggered manually via CLI or automatically at configurable intervals) identifies clusters of semantically similar observations and proposes merges:

  1. Cluster detection: Group observations by semantic similarity (cosine similarity > configurable threshold, e.g., 0.82) within the same project and scope
  2. Conflict detection: Within each cluster, identify contradictions (e.g., "we chose PostgreSQL" vs. "we migrated to SQLite"). Contradictions are flagged, not auto-resolved
  3. Merge proposal: For non-conflicting clusters, generate a consolidated observation that preserves all unique information from the cluster members
  4. Lineage tracking: Merged observations store references to their source observation IDs. The originals are soft-deleted (deleted_at set) but remain queryable for audit

Schema addition

CREATE TABLE consolidation_lineage (
    consolidated_id INTEGER NOT NULL REFERENCES observations(id),
    source_id       INTEGER NOT NULL REFERENCES observations(id),
    created_at      TEXT NOT NULL DEFAULT (datetime('now')),
    PRIMARY KEY (consolidated_id, source_id)
);

Merge strategies

Scenario Strategy
Same topic, no contradiction Merge into single observation with combined content
Same topic, temporal evolution Keep latest, reference history in lineage table
Same topic, contradiction Flag for human review — do not auto-merge
Cross-topic overlap Do not merge — observations serve different retrieval paths

Design principles

  • Human-in-the-loop by default: First implementation should be engram consolidate --dry-run that shows proposed merges. Auto-merge is a future opt-in
  • Reversible: Source observations are soft-deleted, not destroyed. engram consolidate --undo <consolidation_id> restores them
  • Conservative threshold: Default similarity threshold should be high (0.85+) to avoid false merges. Better to under-merge than over-merge
  • Respects topic_key boundaries: Observations with different topic_keys are never merged, even if semantically similar — the agent assigned different keys for a reason

Impact

Engram stays lean over time. Instead of 500 observations with 60% overlap, you get 200 dense, comprehensive observations. More knowledge in fewer tokens.


Capability 4: Temporal Knowledge Graph

What it solves

Observations exist in isolation. There's no way to express or query relationships like:

  • "This SQLite decision superseded the PostgreSQL decision"
  • "This bug was caused by that architecture choice"
  • "These three observations relate to the auth module"

Without relationships, the agent can't navigate knowledge — it can only search for it.

How it works

A lightweight relationship layer on top of existing observations. Relationships are typed, directed, and temporally scoped:

[Observation A] --relationship_type--> [Observation B]
                  valid_from: 2026-03-01
                  valid_to: null (still active)

Schema addition

CREATE TABLE observation_links (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    from_id         INTEGER NOT NULL REFERENCES observations(id),
    to_id           INTEGER NOT NULL REFERENCES observations(id),
    relationship    TEXT NOT NULL,  -- 'supersedes', 'caused_by', 'relates_to', 'implements', 'reverts'
    valid_from      TEXT NOT NULL DEFAULT (datetime('now')),
    valid_to        TEXT,           -- NULL = still active
    created_at      TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(from_id, to_id, relationship)
);

CREATE INDEX idx_links_from ON observation_links(from_id, relationship);
CREATE INDEX idx_links_to ON observation_links(to_id, relationship);
CREATE INDEX idx_links_validity ON observation_links(valid_from, valid_to);

Relationship types

Type Semantics Example
supersedes A replaces B as the current truth "Migrate to SQLite" supersedes "Choose PostgreSQL"
caused_by A was a consequence of B "N+1 query bug" caused_by "ORM lazy loading decision"
relates_to A and B cover related topics "Auth middleware" relates_to "JWT library choice"
implements A is the execution of B "Add rate limiting code" implements "Rate limiting decision"
reverts A undoes B "Rollback feature flag" reverts "Enable feature flag"

New MCP tool

mem_link(from_id, to_id, relationship, valid_from?, valid_to?)
mem_traverse(observation_id, relationship?, direction?, depth?)

mem_traverse returns connected observations up to N hops away, optionally filtered by relationship type. This enables queries like "show me everything that led to this decision" or "what did this architecture choice cause?"

Design principles

  • Agent-driven linking: The agent creates links explicitly during mem_save or as a separate operation. No automatic entity extraction in v1 — that's complex, error-prone, and adds NLP dependencies
  • Temporal validity: Links have valid_from / valid_to so the graph reflects what's CURRENT, not just what was ever true
  • Lightweight traversal: Depth-limited BFS on a small graph (hundreds to low thousands of nodes). No need for a graph database — SQLite handles this with recursive CTEs
  • Existing Obsidian export synergy: The Obsidian Brain export already generates a graph visualization. Links would make that graph actually meaningful instead of just proximity-based

Impact

Engram becomes navigable. Instead of searching in the dark, the agent can follow relationship chains to build a complete picture of how knowledge connects. "What decisions led to the current auth system?" becomes a traversal query, not a prayer to FTS5.


Capability 5: Contextual Priming

Depends on: Capability 1 (Hybrid Search), enhanced by Capability 2 (Salience Scoring)

What it solves

mem_context returns recent sessions and observations chronologically. If an agent starts a session about database optimization, it gets the last 5 sessions — which might be about CSS, auth, and deployment. The context injection is time-based, not relevance-based.

How it works

A new tool mem_prime (or an enhanced mem_context mode) takes the current task context as input and returns the most relevant observations regardless of recency:

  1. Input: The agent passes its current context (user's first message, current file, active task description)
  2. Embedding: The input is embedded using the same provider as observations
  3. Retrieval: Vector similarity search against all observation embeddings, weighted by salience score
  4. Diversification: Results are diversified to avoid returning 5 observations about the same sub-topic. MMR (Maximal Marginal Relevance) ensures breadth
  5. Output: Top-N most relevant observations, ordered by combined relevance + salience score

New MCP tool

mem_prime(context, project?, scope?, limit?)
  • context: Free-text description of what the agent is about to work on
  • Returns: Ranked list of observations most relevant to that context

How it differs from mem_search

Aspect mem_search mem_prime
Input A search query (keywords) A task context (paragraph)
Intent "Find this specific thing" "What should I know before starting?"
Ranking BM25 / RRF relevance Relevance × salience × diversity
Diversity None — may return 5 hits about the same thing MMR ensures topic coverage
Typical use Mid-task lookup Session start / task start

Design principles

  • Complementary, not replacing: mem_context (chronological) and mem_search (keyword) remain unchanged. mem_prime is a new retrieval mode for a different use case
  • Requires embeddings: Only available when an embedding provider is configured. Without embeddings, agents continue using mem_context and mem_search as today
  • Agent protocol integration: The CLAUDE.md protocol currently instructs agents to call mem_context at session start. With mem_prime, the protocol would add: "After mem_context, call mem_prime with the user's first message to surface relevant prior knowledge"
  • Token-budget aware: Accepts a limit parameter so the agent controls how much context budget to allocate to priming

Impact

Engram becomes proactive. Instead of waiting for the agent to ask the right question with the right keywords, it surfaces relevant knowledge before it's needed. Like a senior colleague who says "before you start on that, you should know we already tried X and it didn't work because Y."


Dependency Graph

                    ┌──────────────────┐
                    │  1. Hybrid Search │  ← foundational, enables 3 and 5
                    │  (Vector + FTS5)  │
                    └────────┬─────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
              ▼              ▼              ▼
   ┌─────────────────┐ ┌──────────┐ ┌──────────────────┐
   │ 3. Consolidation│ │2. Salience│ │ 4. Knowledge     │
   │   (needs vectors │ │ (independent│ │    Graph         │
   │    for clustering)│ │  no deps)  │ │  (independent    │
   └─────────────────┘ └─────┬────┘ │   no deps)       │
                              │      └──────────────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │ 5. Contextual    │
                    │    Priming       │  ← needs vectors + benefits from salience
                    │ (needs 1, uses 2)│
                    └──────────────────┘

Recommended implementation order:

Phase Capability Rationale
Phase 1 Hybrid Search Unlocks semantic understanding. Enables phases 3 and 5. Aligns with existing Issue #139
Phase 2 Salience Scoring Independent, low complexity, immediate value. Three new columns + one index
Phase 3 Knowledge Graph Independent, moderate complexity. New table + 2 MCP tools. Enriches the Obsidian export
Phase 4 Consolidation Requires Phase 1. Moderate complexity. New table + CLI command
Phase 5 Contextual Priming Requires Phase 1, benefits from Phase 2. New MCP tool

Phases 2 and 3 can be implemented in parallel with Phase 1, as they have no dependencies on it.


Retroactive Compatibility

All five capabilities work with existing observations:

Capability Retroactive path
Hybrid Search engram backfill-embeddings generates vectors for all existing observations
Salience All existing observations start at salience = 1.0, differentiate through natural usage
Consolidation First run scans all existing observations for merge candidates
Knowledge Graph Starts empty — agents create links going forward. Optional: engram infer-links could propose links based on topic_key overlap and temporal proximity
Contextual Priming Works immediately once embeddings exist (after backfill)

No migration breaks existing behavior. Every capability degrades gracefully to current behavior when not configured.


Alignment with Engram's Design Philosophy

Engram principle How this proposal respects it
Local-first SQLite All data stays in SQLite. Embeddings are BLOB columns, not external vector DBs. Graph is a SQLite table with recursive CTEs, not Neo4j
Zero dependencies Every capability is opt-in. Without configuration, Engram behaves identically to today. No new required dependencies
Single binary Embedding provider interface is pluggable. Local Ollama is the recommended default — no cloud APIs required
Agent decides what matters Links are agent-created, not auto-extracted. Consolidation proposes, human approves. Salience is derived from agent behavior
FTS5 covers 95% of cases FTS5 remains the primary search path. Vector search augments, never replaces
Privacy at two layers Embeddings inherit the same <private> stripping. Consolidated observations go through the same sanitization pipeline

Related Issues


Summary

These five capabilities transform Engram along five dimensions:

Dimension Before After
Search Finds what you said Finds what you meant
Relevance All memories equal Most useful memories surface first
Structure Isolated notes Connected knowledge graph
Scale Grows linearly forever Self-consolidates over time
Context Loads recent history Loads relevant knowledge

The result is a memory system that behaves less like a search engine and more like an experienced colleague's memory — associative, prioritized, connected, and proactive.

Each capability is independently valuable, opt-in, and backwards-compatible. They can be implemented incrementally without architectural disruption.

📦 Affected Area

CLI (commands, flags)

🔄 Alternatives Considered

No response

📎 Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions