Skip to content

feat(search): pluggable vector/embedding search with hybrid FTS5+RRF#139

Closed
jtomaszon wants to merge 5 commits intoGentleman-Programming:mainfrom
scaledb-io:feature/vector-search
Closed

feat(search): pluggable vector/embedding search with hybrid FTS5+RRF#139
jtomaszon wants to merge 5 commits intoGentleman-Programming:mainfrom
scaledb-io:feature/vector-search

Conversation

@jtomaszon
Copy link
Copy Markdown

Summary

Adds semantic search capability alongside existing FTS5 keyword search, following the architecture endorsed in #21 and #24.

  • New internal/embedding/ packageProvider interface with Ollama and OpenAI implementations. Pure-Go cosine similarity, binary serialization, and Reciprocal Rank Fusion (RRF) merge. No new dependencies beyond net/http.
  • Hybrid search — When an embedding provider is configured, Search() runs FTS5 + vector cosine similarity and merges results via RRF (k=60). Falls back to FTS5-only when no provider is set — zero overhead for existing users.
  • Async embedding on save/updateAddObservation and UpdateObservation trigger background embedding generation. Content changes re-embed automatically.
  • observation_embeddings table — Created on migration, nullable. Separate from observations to keep FTS5 scans fast.
  • CLI flags--embedding-provider=ollama|openai|none, --embedding-model, --embedding-url, plus ENGRAM_EMBEDDING_* env vars.
  • engram backfill-embeddings — Bulk-embeds existing observations that don't have embeddings yet.

Context

This implements the approach discussed by @Gentleman-Programming in #21 and #24:

  1. FTS5 stays default — zero overhead when no provider configured
  2. Vector search as opt-in via pluggable providers
  3. Hybrid search merging FTS5 + vector results

Design decisions

Decision Rationale
FTS5 stays default Existing behavior unchanged when no provider configured
Brute-force cosine Personal memory is typically <10K obs. Benchmark: 564ns/op on 768-dim (56ms for 100K obs)
RRF over weighted linear Rank-based, no need to normalize BM25 and cosine to same scale
Separate embeddings table Large BLOBs (~3KB each) would bloat observation table scans
Async embedding Network calls (50-500ms) shouldn't block observation saves
No new Go deps Providers use stdlib net/http only. CGO_ENABLED=0 preserved
Agent-agnostic providers Users plug Ollama (local/free) or OpenAI (cloud) via config

Stats

  • 1,495 lines added across 9 files (6 new + 3 modified)
  • 33 new tests (22 embedding + 11 store integration)
  • All 346 existing tests pass — zero regressions
  • Benchmark: BenchmarkCosineSimilarity768: 564.5 ns/op, 0 allocs

Refs #21, #24

Test plan

  • All existing 346 tests pass unchanged
  • New embedding provider tests with mock HTTP servers
  • Cosine similarity correctness (identical=1, orthogonal=0, opposite=-1)
  • RRF merge ordering verified with known inputs
  • Store integration: hybrid search returns results from both FTS5 and vector
  • Store integration: vector search respects project/scope/type filters
  • Backfill generates embeddings for all unembedded observations
  • No-provider path returns FTS5-only results (backward compatible)
  • Embedding table created on migration

🤖 Generated with Claude Code

Javier Zon and others added 5 commits March 31, 2026 21:58
Add semantic search capability alongside existing FTS5 keyword search.
When an embedding provider is configured, observations are embedded on
save/update and search results merge FTS5 and vector cosine similarity
via Reciprocal Rank Fusion (k=60). Falls back to FTS5-only when no
provider is configured — zero overhead for existing users.

New internal/embedding package:
- Provider interface with Ollama and OpenAI implementations
- Pure-Go cosine similarity and binary serialization (no CGO)
- RRF merge for combining ranked result lists

Store changes:
- observation_embeddings table (created on migration, nullable)
- Async embedding generation on AddObservation/UpdateObservation
- Hybrid Search: FTS5 → vector scan → RRF merge → unified results
- BackfillEmbeddings for bulk embedding existing observations

CLI:
- --embedding-provider, --embedding-model, --embedding-url flags
- ENGRAM_EMBEDDING_PROVIDER/MODEL/URL/API_KEY env vars
- engram backfill-embeddings command

Refs: Gentleman-Programming#21, Gentleman-Programming#24

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cmdServe was missing the configureEmbeddings call, so embedding
env vars (ENGRAM_EMBEDDING_PROVIDER etc.) were ignored when running
`engram serve`. Now both serve and mcp commands honor embedding config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nomic-embed-text has an 8192 token context window (~6K chars of mixed
prose/code). Observations exceeding this limit were silently failing.
Now we truncate to 6000 chars and log a clear warning with the original
and truncated sizes so users know to split large observations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each provider now reports its own maximum text length via MaxChars().
Ollama uses empirically tested limits per model (e.g., nomic-embed-text
6000 chars for mixed markdown/code). OpenAI uses token-based estimates.
Truncation logs a clear warning with model name, original and truncated
sizes.

This replaces the previous hardcoded 6000 char global constant with
provider-aware limits, so larger-context models (Voyage 32K, Cohere
128K) won't have their input unnecessarily truncated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive guide covering: architecture (dual memory paths),
installation, MCP server config, PostToolUse hook for reactive sync,
embedding provider setup (Ollama/OpenAI), bulk seeding, and backfill.
Includes copy-pasteable config snippets and launchd plist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@Alan-TheGentleman Alan-TheGentleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hola @jtomaszon! 👋

Primero que nada, quiero reconocer que esto es un laburazo. La idea de pluggable vector/embedding search con hybrid FTS5+RRF es ambiciosa y está muy bien pensada. El approach de mantener FTS5 como default con zero overhead, vector search opt-in, y RRF para el merge es exactamente la dirección que queremos. Se nota que le metiste cabeza a las decisiones de diseño (brute-force cosine para <10K obs, tabla separada de embeddings, async embedding, sin deps nuevas). Muy sólido.

El tema: el PR tiene conflictos de merge con main que hay que resolver antes de que podamos avanzar con el review. Necesito pedirte que hagas un rebase sobre el main actual y resuelvas los conflictos.

git fetch origin
git rebase origin/main
# resolver conflictos
git push --force-with-lease

Una vez que los conflictos estén resueltos, nos sentamos a hacer un review arquitectónico completo y detallado. Hay mucho para mirar (1,495 líneas nuevas, 9 archivos, 33 tests) y quiero darle la atención que se merece.

Dale para adelante con el rebase y avisame cuando esté listo. Estamos acá para lo que necesites.

@Alan-TheGentleman
Copy link
Copy Markdown
Collaborator

¡Buenas! Primero que nada, se nota el laburo que le metiste a esto — es un PR serio, bien armado, y eso se respeta mucho.

Pero tengo que ser directo: la filosofía core de engram es un solo binario + una sola base de datos SQLite + cero dependencias externas. Es así de simple y es así de poderoso. Requerir Ollama u OpenAI para la búsqueda rompe ese principio fundamental.

La gracia de engram es que lo instalás y funciona. No tenés que levantar Ollama, no tenés que configurar API keys de OpenAI, no tenés que manejar modelos de embedding. Un binario, un archivo SQLite, y listo.

Como referencia, hay otro PR (#170) que propone búsqueda por SimHash — un approach puramente local, sin dependencias externas. Eso se alinea más con la filosofía, aunque también necesita pasar por el proceso de discusión en un issue primero.

Si te interesa seguir contribuyendo (y ojalá que sí), considerá approaches que se mantengan dentro de esa filosofía: todo local, todo autocontenido, sin servicios externos.

Gracias de verdad por el esfuerzo, y disculpá que no podamos mergear esto. ¡Las puertas están abiertas para futuras contribuciones!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants