Skip to content

feat: semantic memory with LanceDB + Gemini embeddings#1

Closed
5queezer wants to merge 30 commits intomainfrom
feat/semantic-memory-and-swarm
Closed

feat: semantic memory with LanceDB + Gemini embeddings#1
5queezer wants to merge 30 commits intomainfrom
feat/semantic-memory-and-swarm

Conversation

@5queezer
Copy link
Copy Markdown
Owner

@5queezer 5queezer commented Mar 11, 2026

Summary

  • Semantic memory: Adds LanceDB-powered vector memory to container agents via 4 MCP tools (memory_store, memory_search, memory_delete, memory_count). Uses Gemini embedding-001 (3072-dim) for embeddings. Supports local embedded LanceDB or cloud via LANCEDB_URI env var.
  • Migration script: scripts/migrate-memories.mjs imports memories from OpenClaw JSONL backups with re-embedding.
  • Container runner: passes GEMINI_API_KEY, LANCEDB_URI, LANCEDB_API_KEY env vars to containers for MCP tool access.

Security fixes applied

  • Sanitized LanceDB filter inputs (category, ID) to prevent injection
  • crypto.randomUUID() for memory IDs instead of Math.random()
  • 30s fetch timeout on Gemini API calls
  • Race-safe singleton for LanceDB table initialization

Review fixes applied

  • Configurable LanceDB path via MEMORY_LANCEDB_DIR env var (no more hardcoded path)
  • Gemini error body included in thrown error messages
  • LanceDB version aligned to ^0.26.2 (matches root)
  • Float32Array used consistently for vectors (matches schema + migration)

Removed from PR

  • Telegram agent swarm (bot pool) — already available as an upstream skill at nanoclaw.dev/skills/telegram-swarm

Test plan

  • Ask agent to store and search memories
  • Verify memory_count returns correct total
  • Test with LANCEDB_URI set (cloud) and unset (local)

🤖 Generated with Claude Code

@5queezer 5queezer force-pushed the feat/semantic-memory-and-swarm branch from 9980b52 to a112592 Compare March 11, 2026 23:22
@5queezer
Copy link
Copy Markdown
Owner Author

Issues by priority:

🔴 High
• Hardcoded /workspace/group/memory/lancedb path — use an env var instead
• Suppressed Gemini error body (err is read but never included in the thrown message) — makes debugging hard
• LanceDB version mismatch: root uses ^0.26.2, agent-runner uses ^0.17.0

🟡 Medium
• memory.ts stores vectors as Float32 schema but passes raw number[] — the migration script already uses Float32Array correctly, inconsistency should be fixed
• Math.random() still used for IPC filenames/task IDs despite PR claiming crypto.randomUUID() was adopted
• Message splitting cuts at hard character boundaries — can break mid-word or split Markdown markup (e.g. bold)
• setTimeout(r, 2000) hardcoded rename propagation delay is fragile

🟢 Low
• Multiple ctx.chat as any casts — grammY has proper discriminated union types
• npm test 2>/dev/null silences test errors in CI, making failures hard to diagnose
• Migration script has a hardcoded dated backup path and no retry/checkpoint logic — if it fails halfway, you lose progress


Overall the architecture is solid. The main things I'd fix before merging: the hardcoded path, the error suppression, the LanceDB version mismatch, and the Float32Array inconsistency. The rest are nice-to-haves. 👍

5queezer and others added 3 commits March 11, 2026 23:38
Adds persistent semantic memory to container agents via 4 MCP tools
(memory_store, memory_search, memory_delete, memory_count). Uses
LanceDB for vector storage (local or cloud via LANCEDB_URI) and
Gemini embedding-001 for 3072-dim embeddings. Includes migration
script for importing memories from OpenClaw JSONL backups.

Security: sanitized filter inputs, crypto.randomUUID for IDs,
30s fetch timeout, race-safe singleton initialization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each subagent in an agent team gets a dedicated pool bot identity
in Telegram groups. Bots are assigned round-robin and renamed via
setMyName to match agent roles. IPC messages with a sender field
route through the pool instead of the main bot.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use MEMORY_LANCEDB_DIR env var instead of hardcoded path
- Include Gemini error body in thrown error message for debugging
- Align agent-runner LanceDB to ^0.26.2 (matches root package.json)
- Pass Float32Array to LanceDB add/search (matches schema + migration)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@5queezer 5queezer force-pushed the feat/semantic-memory-and-swarm branch from 666f724 to a07aa08 Compare March 11, 2026 23:39
The Telegram agent swarm (bot pool) is already available as an
installable skill at nanoclaw.dev/skills/telegram-swarm. Removes
the bundled implementation to keep the PR focused on semantic memory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@5queezer 5queezer changed the title feat: semantic memory + Telegram agent swarm feat: semantic memory with LanceDB + Gemini embeddings Mar 11, 2026
Restore .env.example, repo-tokens/badge.svg, src/channels/index.ts
to upstream main state. Remove .github/workflows/fork-sync-skills.yml
and telegram channel files that don't exist on upstream.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5queezer and others added 5 commits March 12, 2026 07:37
…y-and-swarm

# Conflicts:
#	package-lock.json
#	repo-tokens/badge.svg
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- package.json: removed duplicate apache-arrow entry
- index.ts: kept mcp__gmail__* in allowedTools from main
- ipc-mcp-stdio.ts: removed duplicate memory import
- memory.ts: used x-goog-api-key header (main) and single-quote SQL (main)
- migrate-memories.mjs: kept retry logic and header auth from main
- container-runner.ts: used .gmail-mcp mount (main) over himalaya

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@5queezer 5queezer closed this Mar 13, 2026
5queezer added a commit that referenced this pull request Mar 14, 2026
- migrate-memories.mjs: pass apiKey for LanceDB Cloud URIs (medium #1)
- Throw on old schema without scope column instead of silent warn (medium #2)
- Log hint when rerank API key is present but RERANK_PROVIDER unset (medium qwibitai#3)
- Validate vectorDim early for custom providers (low qwibitai#4)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants