feat: voice, vision, search, todo extraction, dashboard metrics#1244
Closed
Mbda1 wants to merge 23 commits into
Closed
feat: voice, vision, search, todo extraction, dashboard metrics#1244Mbda1 wants to merge 23 commits into
Mbda1 wants to merge 23 commits into
Conversation
- Add local_model field to AgentDefaults config (routes to Ollama/Mistral) - Memory consolidation now uses local model when configured, falling back to cloud model — eliminates token spend on summarization - Query enrichment: local model rewrites user messages (>=6 words) before sending to cloud, with 10s timeout and graceful fallback to original - Supervisor daemon: replace Anthropic dependency with local LiteLLM/Ollama for log analysis and autonomous fix detection — fully token-free - Supervisor: add gateway watchdog (30s poll), lifecycle Telegram events (startup, crash, recovery), and audit log to SUPERVISOR_LOG.md - Add nanobot supervisor CLI command with persistent log file setup - Add anthropic package as optional dep (kept for future use) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WSL2 cannot reach Windows localhost directly. Update hardcoded api_base in query enrichment and supervisor log analysis to use host.docker.internal which resolves stably to the Windows host from WSL2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds nanobot ps/stop/start/restart subcommands for managing gateway and supervisor agents. Targets individual agents or all at once: nanobot ps # show running agents nanobot stop # stop all (safe for code changes) nanobot stop supervisor # stop one agent nanobot restart # restart all nanobot start gateway # start one agent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Supervisor now only auto-applies restart_gateway; all other fix types (edit_config, patch_file, edit_workspace) are replaced with suggest — the issue is detected and notified via Telegram but never auto-applied. This prevents hallucinated config patches and bad file writes. Also fix NameError: shutil not in scope in _start_agent helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Raw exception strings (e.g. litellm.AuthenticationError) were reaching users as Telegram messages via two paths: 1. litellm_provider returns finish_reason="error" with the exception as content — now raises RuntimeError instead of letting it flow through as final_content. 2. run() outer handler sent str(e) directly to the user — now sends a generic friendly message while still logging the full error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ywords "Anthropic security statement" was miscategorized to longevity.md because "anthropic" wasn't in the keyword list. Adding major AI company/model names so tweets from @AnthropicAI and similar accounts land in knowledge/ai.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…bling _edit_config, _patch_file, and _edit_workspace were dead code — the allowlist only permits restart_gateway, so they could never be called. _edit_workspace in particular caused the doubled-path bug (/workspace/.nanobot/workspace/TODO.md) when Mistral hallucinated a workspace-relative path that already contained the workspace prefix. Removing the functions eliminates the hazard if the allowlist is ever loosened accidentally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Memory pipeline (memory.py): - Replace monolithic consolidation with Chunk→Collect→Assemble→Merge pipeline - CHUNK: Python splits messages into groups of 8 (zero tokens) - COLLECT: local Mistral summarizes each chunk (≤300 tokens, 20s timeout) - ASSEMBLE: Python joins partial summaries (zero tokens) - MERGE: one bounded save_memory tool call (≤2048 tokens) - Graceful fallback: chunk timeout → raw truncated text, never skips Architecture docs (CLAUDE.md): - Add Architecture Principles section documenting the pipeline pattern, memory consolidation design, write-file pattern, and config boundaries table Session history repair (session/manager.py): - get_history() now strips orphaned tool_results and dangling tool_calls that arise when the memory window bisects a tool-call sequence Prompt cache fix (context.py): - Timestamp format: %Y-%m-%d %H:%M → %Y-%m-%d (%A) so system prompt is stable for 24h and OpenRouter prompt cache actually hits Web search (tools/web.py): - Switch from Brave Search API to DuckDuckGo via ddgs (no API key required) Supervisor notifications (supervisor/daemon.py): - Telegram alerts only on critical severity or auto-applied fixes; suggestions/warnings logged locally to stop notification spam Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… SQLite - Add nanobot/config/constants.py as single source of truth for all hardcoded values (models, timeouts, memory tuning, token limits) - Wire constants into memory.py, context.py, loop.py, daemon.py, web.py - Extract _enrich_query() from AgentLoop into nanobot/agent/enrichment.py; loop.py calls enrich_query() from the new module (-45 lines) - Garage: create garage.db (SQLite), migrate.py, garage_query.py; update car-garage SKILL.md to query SQLite instead of markdown files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add nanobot/agent/usage.py — lightweight JSONL logger, one line per LLM call - Wire into loop.py (agent calls), memory.py (chunk + merge), enrichment.py - Init'd at AgentLoop startup; no-ops gracefully if workspace not set - Log: ~/.nanobot/workspace/memory/USAGE.jsonl Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: Ollama default keep_alive=5min → model unloads on idle → 30s+ cold-start exceeds 10s enrichment timeout → all local inference skipped. Fix: on AgentLoop.run(), fire a background warmup via direct Ollama /api/chat with keep_alive=-1 (LiteLLM silently drops this param, so direct httpx needed). Result: model pinned until Ollama itself restarts (expires_at ~year 2318). Also add OLLAMA_KEEP_ALIVE=-1 constant for documentation/future use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `nanobot dashboard` command — starts live HTTP server at :8765 - Add "dashboard" to _AGENT_PATTERNS so nanobot start/stop/ps manages it - Server serves HTML shell + /api/data JSON endpoint (zero tokens, local reads) - Refresh button + auto-refresh every 30s via JavaScript fetch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ider The LiteLLM provider always injects its OpenRouter api_base into every call, so provider.chat(model="ollama/mistral") was hitting OpenRouter instead of the local Ollama instance. Fix: - Add local_llm.py: thin httpx wrapper that calls Ollama /api/chat directly, avoiding all LiteLLM routing - enrichment.py: use ollama_chat() instead of litellm.acompletion() - memory.py: chunk summaries stay on cloud (CPU too heavy for local); only enrichment runs on Mistral - commands.py: pass local_model to gateway AgentLoop (was missing — caused entire local stack to silently no-op) - loop.py: await warmup instead of fire-and-forget; warmup timeout 120s to handle Mistral cold-start (~43s on CPU); memory consolidation uses cloud model throughout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merged HKUDS/nanobot upstream (35 commits, tag v0.1.4.post2). Key upstream changes taken: - fix: stabilize system prompt — timestamp injected into user message via _inject_runtime_context() instead of system prompt; better cache reuse - fix(session): get_history uses last_consolidated cursor - fix(heartbeat): virtual tool-call replaces unreliable HEARTBEAT_OK token - fix(memory): handle non-string tool call arguments - fix(templates): tighter AGENTS.md tool call guidelines - fix(mcp): remove default httpx timeout for HTTP transport - fix(web): @Property api_key pattern for runtime resolution Conflict resolutions: - context.py: took upstream (superior cache approach) - session/manager.py: took upstream - web.py: kept DuckDuckGo (no Brave API key); adopted @Property api_key - test_heartbeat_service.py: removed obsolete HEARTBEAT_OK_TOKEN test, updated constructor call to match new HeartbeatService signature All 79 tests passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds WebBrowseTool for JS-rendered pages and sites that block plain httpx requests (403 Forbidden). Registers alongside web_fetch/web_search in AgentLoop. Anti-detection: disabled AutomationControlled flag, navigator.webdriver override, realistic UA + Sec-Fetch headers. System libs installed to ~/.local/playwright-deps without sudo; LD_LIBRARY_PATH injected at import time (self-contained). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two triggers per turn (both reset at the start of each user message): - Per-tool limit: same tool called > CIRCUIT_BREAKER_PER_TOOL (5) times - Consecutive limit: identical tool+arg repeated > CIRCUIT_BREAKER_CONSECUTIVE (2) times On trip, the real tool.execute() is skipped and a synthetic error result is injected so the LLM can gracefully conclude without hitting the API again. Constants are tunable in config/constants.py. 4 new tests cover both breakers, false-positive safety, and under-limit baseline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kens)
New module nanobot/agent/eval.py: judge_response(question, response,
criterion) sends (q, r, criterion) triples to local Mistral and returns
a PASS/FAIL verdict. Empty responses are short-circuited before the
LLM call to prevent hallucination.
11 parametrized eval cases: 5 PASS (good responses), 4 FAIL (bad
responses), 1 empty-response guard, 1 import smoke test. All cases
isolate objective criteria so Mistral 7B judges reliably.
Tests are marked @pytest.mark.llm and skipped unless Ollama is reachable:
pytest -m llm # run eval suite (~80s, needs Ollama)
pytest -m 'not llm' # fast unit tests only (default, 83 tests, ~2.5s)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hot tier — MEMORY.md, always loaded, capped at MEMORY_HOT_MAX_LINES (200).
When exceeded, the oldest ## section is evicted to the warm tier.
Warm tier — memory/topics/*.md, created automatically on overflow.
Keyword-matched against the user's message at call time and
injected into the system prompt only for turns where relevant.
Zero tokens wasted on unrelated memories.
Cold tier — HISTORY.md, grep-searchable event log (unchanged).
Changes:
- constants.py: MEMORY_HOT_MAX_LINES = 200
- memory.py: MemoryStore.trim_hot(), _load_warm_topics(), updated
write_long_term() (auto-trims) and get_memory_context(user_message)
- context.py: passes current user message to get_memory_context() for
warm-tier keyword matching; updated identity to mention both tiers
- 11 new unit tests covering overflow, idempotency, warm loading,
false-positive prevention, and edge cases
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DelegateTool (tools/delegate.py): - Synchronous counterpart to spawn: runs a worker agent inline and returns the result in the same turn. Supervisor sees worker output immediately. - 5 worker roles with specialized system prompts: researcher, writer, analyst, coder, general. - Workers have access to web_search, web_fetch, web_browse (previously missing from subagent tool registry), read/write/edit/exec. SubagentManager (subagent.py): - Extracted _execute_subagent() (run loop, return string) from _run_subagent() (background wrapper that announces via bus). - run_direct(task, role) — synchronous delegate entry point. - WebBrowseTool added to subagent tool registry. - Role prompts in _ROLE_PROMPTS dict, keyed by role name. - DELEGATE_MAX_ITERATIONS constant replaces hardcoded 15. AgentLoop (loop.py) — parallel batch execution: - Tool calls within a single LLM response now run concurrently via asyncio.gather (Phase 2), enabling true parallel delegation. - Circuit breaker bookkeeping stays sequential (Phase 1) so per-tool limits and consecutive detection remain correct. - Results applied to messages in original order (Phase 3). 7 new tests: delegate unit tests, parallel timing proof, circuit breaker preservation, registration check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Split local inference into speed-optimised (enrichment/memory) and quality-optimised (judge/supervisor) models: - LOCAL_MODEL_DEFAULT: ollama/mistral → ollama/qwen2.5:7b (same speed, better quality) - JUDGE_MODEL_DEFAULT: new constant → ollama/mistral-nemo (12B, ~16 tok/s) - eval.py and daemon.py now use JUDGE_MODEL_DEFAULT - conftest.py skips llm tests when judge model isn't pulled yet Models need to be pulled on Windows before taking effect: ollama pull qwen2.5:7b ollama pull mistral-nemo Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mistral Nemo (12B) is stricter than Mistral 7B — required:
- TIMEOUT_JUDGE = 60s in constants (cold-start headroom for 12B)
- eval.py uses TIMEOUT_JUDGE instead of hardcoded 30s
- Two test cases made unambiguous: removed hedging language ("reliably")
that Nemo parsed as logical loopholes, simplified criteria
All 112 tests pass (11 llm tests now run against Nemo).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Added explicit severity rules to the Nemo prompt: - critical: only for runtime failures actively breaking the bot (crash, unrecoverable LLM error) - warning: degraded behaviour, high token usage, recovered failures - info: normal operational noise Prevents architecture observations and context-loss notes from being labelled critical when they aren't immediately actionable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…art race condition - Record latency_ms per LLM call in USAGE.jsonl (time.monotonic around acompletion) - Move typing indicator to top of _on_message so voice/media downloads show feedback immediately - Fix supervisor _restart_gateway: poll until old PIDs die before spawning new gateway, with SIGKILL fallback after 10s — prevents dual-gateway Telegram conflict on restart Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
|
Closing — this was opened against the wrong repo. Changes are personal-use features for a fork. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Seven new features for the personal nanobot assistant:
LocalWhisperProvideruses already-installedopenai-whisper(tiny model, CPU, async). Tries Groq first if key present, falls back silently to local whisper. No new dependencies required.[image: /path]text in Telegram handler so Claude Haiku 4.5 vision actually receives clean input. The base64 pipeline incontext.pywas already wired — this just unblocked it./searchcommand — Keyword pre-filter across all session JSONL files, then semantic re-rank usingnomic-embed-textcosine similarity if Ollama is reachable. Returns top 5 results. Added to Telegram command menu.- [ ]items totodo.mdautomatically.loop.pyrecordsturn_latencyper message toMETRICS.jsonl. Dashboard shows p50/p95 panels.memory.pyrecordswarm_hittoMETRICS.jsonlwhen topic files are injected. Dashboard shows hit rate + lifetime hit/miss counts.usage.pymetrics infrastructure — Newrecord_metric(event, **kwargs)function writes non-token instrumentation events toMETRICS.jsonlalongside the existingUSAGE.jsonl.Also bundles pre-existing uncommitted work: async context builder, supervisor enhancements, embeddings module, tracing utilities, and updated tests.
Test plan
/search LS swap→ confirm session history results returnedtodo.mdfor auto-extracted taskshttp://localhost:8765→ confirm latency + warm hit rate panels visiblepython3 ~/.nanobot/workspace/skills/budget-guard/check_budget.py→ confirm JSON output🤖 Generated with Claude Code