feat(search): pluggable vector/embedding search with hybrid FTS5+RRF by jtomaszon · Pull Request #139 · Gentleman-Programming/engram

jtomaszon · 2026-04-01T02:02:44Z

Summary

Adds semantic search capability alongside existing FTS5 keyword search, following the architecture endorsed in #21 and #24.

New internal/embedding/ package — Provider interface with Ollama and OpenAI implementations. Pure-Go cosine similarity, binary serialization, and Reciprocal Rank Fusion (RRF) merge. No new dependencies beyond net/http.
Hybrid search — When an embedding provider is configured, Search() runs FTS5 + vector cosine similarity and merges results via RRF (k=60). Falls back to FTS5-only when no provider is set — zero overhead for existing users.
Async embedding on save/update — AddObservation and UpdateObservation trigger background embedding generation. Content changes re-embed automatically.
observation_embeddings table — Created on migration, nullable. Separate from observations to keep FTS5 scans fast.
CLI flags — --embedding-provider=ollama|openai|none, --embedding-model, --embedding-url, plus ENGRAM_EMBEDDING_* env vars.
engram backfill-embeddings — Bulk-embeds existing observations that don't have embeddings yet.

Context

This implements the approach discussed by @Gentleman-Programming in #21 and #24:

FTS5 stays default — zero overhead when no provider configured
Vector search as opt-in via pluggable providers
Hybrid search merging FTS5 + vector results

Design decisions

Decision	Rationale
FTS5 stays default	Existing behavior unchanged when no provider configured
Brute-force cosine	Personal memory is typically <10K obs. Benchmark: 564ns/op on 768-dim (56ms for 100K obs)
RRF over weighted linear	Rank-based, no need to normalize BM25 and cosine to same scale
Separate embeddings table	Large BLOBs (~3KB each) would bloat observation table scans
Async embedding	Network calls (50-500ms) shouldn't block observation saves
No new Go deps	Providers use stdlib `net/http` only. CGO_ENABLED=0 preserved
Agent-agnostic providers	Users plug Ollama (local/free) or OpenAI (cloud) via config

Stats

1,495 lines added across 9 files (6 new + 3 modified)
33 new tests (22 embedding + 11 store integration)
All 346 existing tests pass — zero regressions
Benchmark: BenchmarkCosineSimilarity768: 564.5 ns/op, 0 allocs

Refs #21, #24

Test plan

All existing 346 tests pass unchanged
New embedding provider tests with mock HTTP servers
Cosine similarity correctness (identical=1, orthogonal=0, opposite=-1)
RRF merge ordering verified with known inputs
Store integration: hybrid search returns results from both FTS5 and vector
Store integration: vector search respects project/scope/type filters
Backfill generates embeddings for all unembedded observations
No-provider path returns FTS5-only results (backward compatible)
Embedding table created on migration

🤖 Generated with Claude Code

Add semantic search capability alongside existing FTS5 keyword search. When an embedding provider is configured, observations are embedded on save/update and search results merge FTS5 and vector cosine similarity via Reciprocal Rank Fusion (k=60). Falls back to FTS5-only when no provider is configured — zero overhead for existing users. New internal/embedding package: - Provider interface with Ollama and OpenAI implementations - Pure-Go cosine similarity and binary serialization (no CGO) - RRF merge for combining ranked result lists Store changes: - observation_embeddings table (created on migration, nullable) - Async embedding generation on AddObservation/UpdateObservation - Hybrid Search: FTS5 → vector scan → RRF merge → unified results - BackfillEmbeddings for bulk embedding existing observations CLI: - --embedding-provider, --embedding-model, --embedding-url flags - ENGRAM_EMBEDDING_PROVIDER/MODEL/URL/API_KEY env vars - engram backfill-embeddings command Refs: Gentleman-Programming#21, Gentleman-Programming#24 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cmdServe was missing the configureEmbeddings call, so embedding env vars (ENGRAM_EMBEDDING_PROVIDER etc.) were ignored when running `engram serve`. Now both serve and mcp commands honor embedding config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

nomic-embed-text has an 8192 token context window (~6K chars of mixed prose/code). Observations exceeding this limit were silently failing. Now we truncate to 6000 chars and log a clear warning with the original and truncated sizes so users know to split large observations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each provider now reports its own maximum text length via MaxChars(). Ollama uses empirically tested limits per model (e.g., nomic-embed-text 6000 chars for mixed markdown/code). OpenAI uses token-based estimates. Truncation logs a clear warning with model name, original and truncated sizes. This replaces the previous hardcoded 6000 char global constant with provider-aware limits, so larger-context models (Voyage 32K, Cohere 128K) won't have their input unnecessarily truncated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Comprehensive guide covering: architecture (dual memory paths), installation, MCP server config, PostToolUse hook for reactive sync, embedding provider setup (Ollama/OpenAI), bulk seeding, and backfill. Includes copy-pasteable config snippets and launchd plist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Alan-TheGentleman

Hola @jtomaszon! 👋

Primero que nada, quiero reconocer que esto es un laburazo. La idea de pluggable vector/embedding search con hybrid FTS5+RRF es ambiciosa y está muy bien pensada. El approach de mantener FTS5 como default con zero overhead, vector search opt-in, y RRF para el merge es exactamente la dirección que queremos. Se nota que le metiste cabeza a las decisiones de diseño (brute-force cosine para <10K obs, tabla separada de embeddings, async embedding, sin deps nuevas). Muy sólido.

El tema: el PR tiene conflictos de merge con main que hay que resolver antes de que podamos avanzar con el review. Necesito pedirte que hagas un rebase sobre el main actual y resuelvas los conflictos.

git fetch origin
git rebase origin/main
# resolver conflictos
git push --force-with-lease

Una vez que los conflictos estén resueltos, nos sentamos a hacer un review arquitectónico completo y detallado. Hay mucho para mirar (1,495 líneas nuevas, 9 archivos, 33 tests) y quiero darle la atención que se merece.

Dale para adelante con el rebase y avisame cuando esté listo. Estamos acá para lo que necesites.

Alan-TheGentleman · 2026-04-12T12:59:18Z

¡Buenas! Primero que nada, se nota el laburo que le metiste a esto — es un PR serio, bien armado, y eso se respeta mucho.

Pero tengo que ser directo: la filosofía core de engram es un solo binario + una sola base de datos SQLite + cero dependencias externas. Es así de simple y es así de poderoso. Requerir Ollama u OpenAI para la búsqueda rompe ese principio fundamental.

La gracia de engram es que lo instalás y funciona. No tenés que levantar Ollama, no tenés que configurar API keys de OpenAI, no tenés que manejar modelos de embedding. Un binario, un archivo SQLite, y listo.

Como referencia, hay otro PR (#170) que propone búsqueda por SimHash — un approach puramente local, sin dependencias externas. Eso se alinea más con la filosofía, aunque también necesita pasar por el proceso de discusión en un issue primero.

Si te interesa seguir contribuyendo (y ojalá que sí), considerá approaches que se mantengan dentro de esa filosofía: todo local, todo autocontenido, sin servicios externos.

Gracias de verdad por el esfuerzo, y disculpá que no podamos mergear esto. ¡Las puertas están abiertas para futuras contribuciones!

Javier Zon and others added 5 commits March 31, 2026 21:58

blkdooGit mentioned this pull request Apr 9, 2026

Proposal: Five-Layer Memory Enhancement (Semantic Search, Salience, Graph, Consolidation, Priming) #168

Open

2 tasks

Alan-TheGentleman requested changes Apr 12, 2026

View reviewed changes

Alan-TheGentleman closed this Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): pluggable vector/embedding search with hybrid FTS5+RRF#139

feat(search): pluggable vector/embedding search with hybrid FTS5+RRF#139
jtomaszon wants to merge 5 commits intoGentleman-Programming:mainfrom
scaledb-io:feature/vector-search

jtomaszon commented Apr 1, 2026

Uh oh!

Alan-TheGentleman left a comment

Uh oh!

Alan-TheGentleman commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jtomaszon commented Apr 1, 2026

Summary

Context

Design decisions

Stats

Test plan

Uh oh!

Alan-TheGentleman left a comment

Choose a reason for hiding this comment

Uh oh!

Alan-TheGentleman commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants