feat: voice, vision, search, todo extraction, dashboard metrics by Mbda1 · Pull Request #1244 · HKUDS/nanobot

Mbda1 · 2026-02-26T21:10:39Z

Summary

Seven new features for the personal nanobot assistant:

Voice transcription — LocalWhisperProvider uses already-installed openai-whisper (tiny model, CPU, async). Tries Groq first if key present, falls back silently to local whisper. No new dependencies required.
Image understanding — Fixed redundant [image: /path] text in Telegram handler so Claude Haiku 4.5 vision actually receives clean input. The base64 pipeline in context.py was already wired — this just unblocked it.
/search command — Keyword pre-filter across all session JSONL files, then semantic re-rank using nomic-embed-text cosine similarity if Ollama is reachable. Returns top 5 results. Added to Telegram command menu.
Todo auto-extraction — Second local LLM pass runs after every memory flush, scanning recent turns for commitments/tasks and appending - [ ] items to todo.md automatically.
Dashboard: response latency — loop.py records turn_latency per message to METRICS.jsonl. Dashboard shows p50/p95 panels.
Dashboard: warm tier hit rate — memory.py records warm_hit to METRICS.jsonl when topic files are injected. Dashboard shows hit rate + lifetime hit/miss counts.
usage.py metrics infrastructure — New record_metric(event, **kwargs) function writes non-token instrumentation events to METRICS.jsonl alongside the existing USAGE.jsonl.

Also bundles pre-existing uncommitted work: async context builder, supervisor enhancements, embeddings module, tracing utilities, and updated tests.

Test plan

Send a voice message to the bot → confirm transcript appears in reply
Send a photo → confirm bot describes image content (not just echoes a path)
/search LS swap → confirm session history results returned
After 18+ messages, check todo.md for auto-extracted tasks
Open dashboard at http://localhost:8765 → confirm latency + warm hit rate panels visible
python3 ~/.nanobot/workspace/skills/budget-guard/check_budget.py → confirm JSON output

🤖 Generated with Claude Code

- Add local_model field to AgentDefaults config (routes to Ollama/Mistral) - Memory consolidation now uses local model when configured, falling back to cloud model — eliminates token spend on summarization - Query enrichment: local model rewrites user messages (>=6 words) before sending to cloud, with 10s timeout and graceful fallback to original - Supervisor daemon: replace Anthropic dependency with local LiteLLM/Ollama for log analysis and autonomous fix detection — fully token-free - Supervisor: add gateway watchdog (30s poll), lifecycle Telegram events (startup, crash, recovery), and audit log to SUPERVISOR_LOG.md - Add nanobot supervisor CLI command with persistent log file setup - Add anthropic package as optional dep (kept for future use) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

WSL2 cannot reach Windows localhost directly. Update hardcoded api_base in query enrichment and supervisor log analysis to use host.docker.internal which resolves stably to the Windows host from WSL2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds nanobot ps/stop/start/restart subcommands for managing gateway and supervisor agents. Targets individual agents or all at once: nanobot ps # show running agents nanobot stop # stop all (safe for code changes) nanobot stop supervisor # stop one agent nanobot restart # restart all nanobot start gateway # start one agent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Supervisor now only auto-applies restart_gateway; all other fix types (edit_config, patch_file, edit_workspace) are replaced with suggest — the issue is detected and notified via Telegram but never auto-applied. This prevents hallucinated config patches and bad file writes. Also fix NameError: shutil not in scope in _start_agent helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Raw exception strings (e.g. litellm.AuthenticationError) were reaching users as Telegram messages via two paths: 1. litellm_provider returns finish_reason="error" with the exception as content — now raises RuntimeError instead of letting it flow through as final_content. 2. run() outer handler sent str(e) directly to the user — now sends a generic friendly message while still logging the full error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ywords "Anthropic security statement" was miscategorized to longevity.md because "anthropic" wasn't in the keyword list. Adding major AI company/model names so tweets from @AnthropicAI and similar accounts land in knowledge/ai.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…bling _edit_config, _patch_file, and _edit_workspace were dead code — the allowlist only permits restart_gateway, so they could never be called. _edit_workspace in particular caused the doubled-path bug (/workspace/.nanobot/workspace/TODO.md) when Mistral hallucinated a workspace-relative path that already contained the workspace prefix. Removing the functions eliminates the hazard if the allowlist is ever loosened accidentally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Memory pipeline (memory.py): - Replace monolithic consolidation with Chunk→Collect→Assemble→Merge pipeline - CHUNK: Python splits messages into groups of 8 (zero tokens) - COLLECT: local Mistral summarizes each chunk (≤300 tokens, 20s timeout) - ASSEMBLE: Python joins partial summaries (zero tokens) - MERGE: one bounded save_memory tool call (≤2048 tokens) - Graceful fallback: chunk timeout → raw truncated text, never skips Architecture docs (CLAUDE.md): - Add Architecture Principles section documenting the pipeline pattern, memory consolidation design, write-file pattern, and config boundaries table Session history repair (session/manager.py): - get_history() now strips orphaned tool_results and dangling tool_calls that arise when the memory window bisects a tool-call sequence Prompt cache fix (context.py): - Timestamp format: %Y-%m-%d %H:%M → %Y-%m-%d (%A) so system prompt is stable for 24h and OpenRouter prompt cache actually hits Web search (tools/web.py): - Switch from Brave Search API to DuckDuckGo via ddgs (no API key required) Supervisor notifications (supervisor/daemon.py): - Telegram alerts only on critical severity or auto-applied fixes; suggestions/warnings logged locally to stop notification spam Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… SQLite - Add nanobot/config/constants.py as single source of truth for all hardcoded values (models, timeouts, memory tuning, token limits) - Wire constants into memory.py, context.py, loop.py, daemon.py, web.py - Extract _enrich_query() from AgentLoop into nanobot/agent/enrichment.py; loop.py calls enrich_query() from the new module (-45 lines) - Garage: create garage.db (SQLite), migrate.py, garage_query.py; update car-garage SKILL.md to query SQLite instead of markdown files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add nanobot/agent/usage.py — lightweight JSONL logger, one line per LLM call - Wire into loop.py (agent calls), memory.py (chunk + merge), enrichment.py - Init'd at AgentLoop startup; no-ops gracefully if workspace not set - Log: ~/.nanobot/workspace/memory/USAGE.jsonl Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Root cause: Ollama default keep_alive=5min → model unloads on idle → 30s+ cold-start exceeds 10s enrichment timeout → all local inference skipped. Fix: on AgentLoop.run(), fire a background warmup via direct Ollama /api/chat with keep_alive=-1 (LiteLLM silently drops this param, so direct httpx needed). Result: model pinned until Ollama itself restarts (expires_at ~year 2318). Also add OLLAMA_KEEP_ALIVE=-1 constant for documentation/future use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add `nanobot dashboard` command — starts live HTTP server at :8765 - Add "dashboard" to _AGENT_PATTERNS so nanobot start/stop/ps manages it - Server serves HTML shell + /api/data JSON endpoint (zero tokens, local reads) - Refresh button + auto-refresh every 30s via JavaScript fetch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ider The LiteLLM provider always injects its OpenRouter api_base into every call, so provider.chat(model="ollama/mistral") was hitting OpenRouter instead of the local Ollama instance. Fix: - Add local_llm.py: thin httpx wrapper that calls Ollama /api/chat directly, avoiding all LiteLLM routing - enrichment.py: use ollama_chat() instead of litellm.acompletion() - memory.py: chunk summaries stay on cloud (CPU too heavy for local); only enrichment runs on Mistral - commands.py: pass local_model to gateway AgentLoop (was missing — caused entire local stack to silently no-op) - loop.py: await warmup instead of fire-and-forget; warmup timeout 120s to handle Mistral cold-start (~43s on CPU); memory consolidation uses cloud model throughout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merged HKUDS/nanobot upstream (35 commits, tag v0.1.4.post2). Key upstream changes taken: - fix: stabilize system prompt — timestamp injected into user message via _inject_runtime_context() instead of system prompt; better cache reuse - fix(session): get_history uses last_consolidated cursor - fix(heartbeat): virtual tool-call replaces unreliable HEARTBEAT_OK token - fix(memory): handle non-string tool call arguments - fix(templates): tighter AGENTS.md tool call guidelines - fix(mcp): remove default httpx timeout for HTTP transport - fix(web): @Property api_key pattern for runtime resolution Conflict resolutions: - context.py: took upstream (superior cache approach) - session/manager.py: took upstream - web.py: kept DuckDuckGo (no Brave API key); adopted @Property api_key - test_heartbeat_service.py: removed obsolete HEARTBEAT_OK_TOKEN test, updated constructor call to match new HeartbeatService signature All 79 tests passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds WebBrowseTool for JS-rendered pages and sites that block plain httpx requests (403 Forbidden). Registers alongside web_fetch/web_search in AgentLoop. Anti-detection: disabled AutomationControlled flag, navigator.webdriver override, realistic UA + Sec-Fetch headers. System libs installed to ~/.local/playwright-deps without sudo; LD_LIBRARY_PATH injected at import time (self-contained). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two triggers per turn (both reset at the start of each user message): - Per-tool limit: same tool called > CIRCUIT_BREAKER_PER_TOOL (5) times - Consecutive limit: identical tool+arg repeated > CIRCUIT_BREAKER_CONSECUTIVE (2) times On trip, the real tool.execute() is skipped and a synthetic error result is injected so the LLM can gracefully conclude without hitting the API again. Constants are tunable in config/constants.py. 4 new tests cover both breakers, false-positive safety, and under-limit baseline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…kens) New module nanobot/agent/eval.py: judge_response(question, response, criterion) sends (q, r, criterion) triples to local Mistral and returns a PASS/FAIL verdict. Empty responses are short-circuited before the LLM call to prevent hallucination. 11 parametrized eval cases: 5 PASS (good responses), 4 FAIL (bad responses), 1 empty-response guard, 1 import smoke test. All cases isolate objective criteria so Mistral 7B judges reliably. Tests are marked @pytest.mark.llm and skipped unless Ollama is reachable: pytest -m llm # run eval suite (~80s, needs Ollama) pytest -m 'not llm' # fast unit tests only (default, 83 tests, ~2.5s) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Hot tier — MEMORY.md, always loaded, capped at MEMORY_HOT_MAX_LINES (200). When exceeded, the oldest ## section is evicted to the warm tier. Warm tier — memory/topics/*.md, created automatically on overflow. Keyword-matched against the user's message at call time and injected into the system prompt only for turns where relevant. Zero tokens wasted on unrelated memories. Cold tier — HISTORY.md, grep-searchable event log (unchanged). Changes: - constants.py: MEMORY_HOT_MAX_LINES = 200 - memory.py: MemoryStore.trim_hot(), _load_warm_topics(), updated write_long_term() (auto-trims) and get_memory_context(user_message) - context.py: passes current user message to get_memory_context() for warm-tier keyword matching; updated identity to mention both tiers - 11 new unit tests covering overflow, idempotency, warm loading, false-positive prevention, and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

DelegateTool (tools/delegate.py): - Synchronous counterpart to spawn: runs a worker agent inline and returns the result in the same turn. Supervisor sees worker output immediately. - 5 worker roles with specialized system prompts: researcher, writer, analyst, coder, general. - Workers have access to web_search, web_fetch, web_browse (previously missing from subagent tool registry), read/write/edit/exec. SubagentManager (subagent.py): - Extracted _execute_subagent() (run loop, return string) from _run_subagent() (background wrapper that announces via bus). - run_direct(task, role) — synchronous delegate entry point. - WebBrowseTool added to subagent tool registry. - Role prompts in _ROLE_PROMPTS dict, keyed by role name. - DELEGATE_MAX_ITERATIONS constant replaces hardcoded 15. AgentLoop (loop.py) — parallel batch execution: - Tool calls within a single LLM response now run concurrently via asyncio.gather (Phase 2), enabling true parallel delegation. - Circuit breaker bookkeeping stays sequential (Phase 1) so per-tool limits and consecutive detection remain correct. - Results applied to messages in original order (Phase 3). 7 new tests: delegate unit tests, parallel timing proof, circuit breaker preservation, registration check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Split local inference into speed-optimised (enrichment/memory) and quality-optimised (judge/supervisor) models: - LOCAL_MODEL_DEFAULT: ollama/mistral → ollama/qwen2.5:7b (same speed, better quality) - JUDGE_MODEL_DEFAULT: new constant → ollama/mistral-nemo (12B, ~16 tok/s) - eval.py and daemon.py now use JUDGE_MODEL_DEFAULT - conftest.py skips llm tests when judge model isn't pulled yet Models need to be pulled on Windows before taking effect: ollama pull qwen2.5:7b ollama pull mistral-nemo Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Mistral Nemo (12B) is stricter than Mistral 7B — required: - TIMEOUT_JUDGE = 60s in constants (cold-start headroom for 12B) - eval.py uses TIMEOUT_JUDGE instead of hardcoded 30s - Two test cases made unambiguous: removed hedging language ("reliably") that Nemo parsed as logical loopholes, simplified criteria All 112 tests pass (11 llm tests now run against Nemo). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Added explicit severity rules to the Nemo prompt: - critical: only for runtime failures actively breaking the bot (crash, unrecoverable LLM error) - warning: degraded behaviour, high token usage, recovered failures - info: normal operational noise Prevents architecture observations and context-loss notes from being labelled critical when they aren't immediately actionable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…art race condition - Record latency_ms per LLM call in USAGE.jsonl (time.monotonic around acompletion) - Move typing indicator to top of _on_message so voice/media downloads show feedback immediately - Fix supervisor _restart_gateway: poll until old PIDs die before spawning new gateway, with SIGKILL fallback after 10s — prevents dual-gateway Telegram conflict on restart Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Mbda1 · 2026-02-27T02:17:19Z

Closing — this was opened against the wrong repo. Changes are personal-use features for a fork.

Mbda1 and others added 23 commits February 23, 2026 15:52

Mbda1 force-pushed the main branch from de0516f to 9437002 Compare February 26, 2026 21:12

github-actions Bot mentioned this pull request Feb 27, 2026

🦞 OpenClaw 生态日报 2026-02-27 duanyytop/agents-radar#21

Closed

Mbda1 closed this Feb 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: voice, vision, search, todo extraction, dashboard metrics#1244

feat: voice, vision, search, todo extraction, dashboard metrics#1244
Mbda1 wants to merge 23 commits into
HKUDS:mainfrom
Mbda1:main

Mbda1 commented Feb 26, 2026

Uh oh!

Mbda1 commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mbda1 commented Feb 26, 2026

Summary

Test plan

Uh oh!

Mbda1 commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant