Skip to content

feat: voice, vision, search, todo extraction, dashboard metrics#1244

Closed
Mbda1 wants to merge 23 commits into
HKUDS:mainfrom
Mbda1:main
Closed

feat: voice, vision, search, todo extraction, dashboard metrics#1244
Mbda1 wants to merge 23 commits into
HKUDS:mainfrom
Mbda1:main

Conversation

@Mbda1
Copy link
Copy Markdown

@Mbda1 Mbda1 commented Feb 26, 2026

Summary

Seven new features for the personal nanobot assistant:

  • Voice transcriptionLocalWhisperProvider uses already-installed openai-whisper (tiny model, CPU, async). Tries Groq first if key present, falls back silently to local whisper. No new dependencies required.
  • Image understanding — Fixed redundant [image: /path] text in Telegram handler so Claude Haiku 4.5 vision actually receives clean input. The base64 pipeline in context.py was already wired — this just unblocked it.
  • /search command — Keyword pre-filter across all session JSONL files, then semantic re-rank using nomic-embed-text cosine similarity if Ollama is reachable. Returns top 5 results. Added to Telegram command menu.
  • Todo auto-extraction — Second local LLM pass runs after every memory flush, scanning recent turns for commitments/tasks and appending - [ ] items to todo.md automatically.
  • Dashboard: response latencyloop.py records turn_latency per message to METRICS.jsonl. Dashboard shows p50/p95 panels.
  • Dashboard: warm tier hit ratememory.py records warm_hit to METRICS.jsonl when topic files are injected. Dashboard shows hit rate + lifetime hit/miss counts.
  • usage.py metrics infrastructure — New record_metric(event, **kwargs) function writes non-token instrumentation events to METRICS.jsonl alongside the existing USAGE.jsonl.

Also bundles pre-existing uncommitted work: async context builder, supervisor enhancements, embeddings module, tracing utilities, and updated tests.

Test plan

  • Send a voice message to the bot → confirm transcript appears in reply
  • Send a photo → confirm bot describes image content (not just echoes a path)
  • /search LS swap → confirm session history results returned
  • After 18+ messages, check todo.md for auto-extracted tasks
  • Open dashboard at http://localhost:8765 → confirm latency + warm hit rate panels visible
  • python3 ~/.nanobot/workspace/skills/budget-guard/check_budget.py → confirm JSON output

🤖 Generated with Claude Code

Mbda1 and others added 23 commits February 23, 2026 15:52
- Add local_model field to AgentDefaults config (routes to Ollama/Mistral)
- Memory consolidation now uses local model when configured, falling back
  to cloud model — eliminates token spend on summarization
- Query enrichment: local model rewrites user messages (>=6 words) before
  sending to cloud, with 10s timeout and graceful fallback to original
- Supervisor daemon: replace Anthropic dependency with local LiteLLM/Ollama
  for log analysis and autonomous fix detection — fully token-free
- Supervisor: add gateway watchdog (30s poll), lifecycle Telegram events
  (startup, crash, recovery), and audit log to SUPERVISOR_LOG.md
- Add nanobot supervisor CLI command with persistent log file setup
- Add anthropic package as optional dep (kept for future use)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WSL2 cannot reach Windows localhost directly. Update hardcoded api_base
in query enrichment and supervisor log analysis to use host.docker.internal
which resolves stably to the Windows host from WSL2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds nanobot ps/stop/start/restart subcommands for managing gateway
and supervisor agents. Targets individual agents or all at once:

  nanobot ps                  # show running agents
  nanobot stop                # stop all (safe for code changes)
  nanobot stop supervisor     # stop one agent
  nanobot restart             # restart all
  nanobot start gateway       # start one agent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Supervisor now only auto-applies restart_gateway; all other fix types
(edit_config, patch_file, edit_workspace) are replaced with suggest —
the issue is detected and notified via Telegram but never auto-applied.
This prevents hallucinated config patches and bad file writes.

Also fix NameError: shutil not in scope in _start_agent helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Raw exception strings (e.g. litellm.AuthenticationError) were reaching
users as Telegram messages via two paths:

1. litellm_provider returns finish_reason="error" with the exception as
   content — now raises RuntimeError instead of letting it flow through
   as final_content.

2. run() outer handler sent str(e) directly to the user — now sends a
   generic friendly message while still logging the full error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ywords

"Anthropic security statement" was miscategorized to longevity.md because
"anthropic" wasn't in the keyword list. Adding major AI company/model names
so tweets from @AnthropicAI and similar accounts land in knowledge/ai.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…bling

_edit_config, _patch_file, and _edit_workspace were dead code — the
allowlist only permits restart_gateway, so they could never be called.
_edit_workspace in particular caused the doubled-path bug
(/workspace/.nanobot/workspace/TODO.md) when Mistral hallucinated a
workspace-relative path that already contained the workspace prefix.

Removing the functions eliminates the hazard if the allowlist is ever
loosened accidentally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Memory pipeline (memory.py):
- Replace monolithic consolidation with Chunk→Collect→Assemble→Merge pipeline
- CHUNK: Python splits messages into groups of 8 (zero tokens)
- COLLECT: local Mistral summarizes each chunk (≤300 tokens, 20s timeout)
- ASSEMBLE: Python joins partial summaries (zero tokens)
- MERGE: one bounded save_memory tool call (≤2048 tokens)
- Graceful fallback: chunk timeout → raw truncated text, never skips

Architecture docs (CLAUDE.md):
- Add Architecture Principles section documenting the pipeline pattern,
  memory consolidation design, write-file pattern, and config boundaries table

Session history repair (session/manager.py):
- get_history() now strips orphaned tool_results and dangling tool_calls
  that arise when the memory window bisects a tool-call sequence

Prompt cache fix (context.py):
- Timestamp format: %Y-%m-%d %H:%M → %Y-%m-%d (%A) so system prompt is
  stable for 24h and OpenRouter prompt cache actually hits

Web search (tools/web.py):
- Switch from Brave Search API to DuckDuckGo via ddgs (no API key required)

Supervisor notifications (supervisor/daemon.py):
- Telegram alerts only on critical severity or auto-applied fixes;
  suggestions/warnings logged locally to stop notification spam

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… SQLite

- Add nanobot/config/constants.py as single source of truth for all
  hardcoded values (models, timeouts, memory tuning, token limits)
- Wire constants into memory.py, context.py, loop.py, daemon.py, web.py
- Extract _enrich_query() from AgentLoop into nanobot/agent/enrichment.py;
  loop.py calls enrich_query() from the new module (-45 lines)
- Garage: create garage.db (SQLite), migrate.py, garage_query.py;
  update car-garage SKILL.md to query SQLite instead of markdown files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add nanobot/agent/usage.py — lightweight JSONL logger, one line per LLM call
- Wire into loop.py (agent calls), memory.py (chunk + merge), enrichment.py
- Init'd at AgentLoop startup; no-ops gracefully if workspace not set
- Log: ~/.nanobot/workspace/memory/USAGE.jsonl

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: Ollama default keep_alive=5min → model unloads on idle →
30s+ cold-start exceeds 10s enrichment timeout → all local inference skipped.

Fix: on AgentLoop.run(), fire a background warmup via direct Ollama /api/chat
with keep_alive=-1 (LiteLLM silently drops this param, so direct httpx needed).
Result: model pinned until Ollama itself restarts (expires_at ~year 2318).

Also add OLLAMA_KEEP_ALIVE=-1 constant for documentation/future use.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `nanobot dashboard` command — starts live HTTP server at :8765
- Add "dashboard" to _AGENT_PATTERNS so nanobot start/stop/ps manages it
- Server serves HTML shell + /api/data JSON endpoint (zero tokens, local reads)
- Refresh button + auto-refresh every 30s via JavaScript fetch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ider

The LiteLLM provider always injects its OpenRouter api_base into every
call, so provider.chat(model="ollama/mistral") was hitting OpenRouter
instead of the local Ollama instance. Fix:

- Add local_llm.py: thin httpx wrapper that calls Ollama /api/chat
  directly, avoiding all LiteLLM routing
- enrichment.py: use ollama_chat() instead of litellm.acompletion()
- memory.py: chunk summaries stay on cloud (CPU too heavy for local);
  only enrichment runs on Mistral
- commands.py: pass local_model to gateway AgentLoop (was missing —
  caused entire local stack to silently no-op)
- loop.py: await warmup instead of fire-and-forget; warmup timeout 120s
  to handle Mistral cold-start (~43s on CPU); memory consolidation uses
  cloud model throughout

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merged HKUDS/nanobot upstream (35 commits, tag v0.1.4.post2).

Key upstream changes taken:
- fix: stabilize system prompt — timestamp injected into user message
  via _inject_runtime_context() instead of system prompt; better cache reuse
- fix(session): get_history uses last_consolidated cursor
- fix(heartbeat): virtual tool-call replaces unreliable HEARTBEAT_OK token
- fix(memory): handle non-string tool call arguments
- fix(templates): tighter AGENTS.md tool call guidelines
- fix(mcp): remove default httpx timeout for HTTP transport
- fix(web): @Property api_key pattern for runtime resolution

Conflict resolutions:
- context.py: took upstream (superior cache approach)
- session/manager.py: took upstream
- web.py: kept DuckDuckGo (no Brave API key); adopted @Property api_key
- test_heartbeat_service.py: removed obsolete HEARTBEAT_OK_TOKEN test,
  updated constructor call to match new HeartbeatService signature

All 79 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds WebBrowseTool for JS-rendered pages and sites that block plain
httpx requests (403 Forbidden). Registers alongside web_fetch/web_search
in AgentLoop. Anti-detection: disabled AutomationControlled flag,
navigator.webdriver override, realistic UA + Sec-Fetch headers.

System libs installed to ~/.local/playwright-deps without sudo;
LD_LIBRARY_PATH injected at import time (self-contained).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two triggers per turn (both reset at the start of each user message):
- Per-tool limit: same tool called > CIRCUIT_BREAKER_PER_TOOL (5) times
- Consecutive limit: identical tool+arg repeated > CIRCUIT_BREAKER_CONSECUTIVE (2) times

On trip, the real tool.execute() is skipped and a synthetic error result is
injected so the LLM can gracefully conclude without hitting the API again.
Constants are tunable in config/constants.py. 4 new tests cover both
breakers, false-positive safety, and under-limit baseline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kens)

New module nanobot/agent/eval.py: judge_response(question, response,
criterion) sends (q, r, criterion) triples to local Mistral and returns
a PASS/FAIL verdict. Empty responses are short-circuited before the
LLM call to prevent hallucination.

11 parametrized eval cases: 5 PASS (good responses), 4 FAIL (bad
responses), 1 empty-response guard, 1 import smoke test. All cases
isolate objective criteria so Mistral 7B judges reliably.

Tests are marked @pytest.mark.llm and skipped unless Ollama is reachable:
    pytest -m llm       # run eval suite (~80s, needs Ollama)
    pytest -m 'not llm' # fast unit tests only (default, 83 tests, ~2.5s)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hot tier  — MEMORY.md, always loaded, capped at MEMORY_HOT_MAX_LINES (200).
            When exceeded, the oldest ## section is evicted to the warm tier.

Warm tier — memory/topics/*.md, created automatically on overflow.
            Keyword-matched against the user's message at call time and
            injected into the system prompt only for turns where relevant.
            Zero tokens wasted on unrelated memories.

Cold tier — HISTORY.md, grep-searchable event log (unchanged).

Changes:
- constants.py: MEMORY_HOT_MAX_LINES = 200
- memory.py: MemoryStore.trim_hot(), _load_warm_topics(), updated
  write_long_term() (auto-trims) and get_memory_context(user_message)
- context.py: passes current user message to get_memory_context() for
  warm-tier keyword matching; updated identity to mention both tiers
- 11 new unit tests covering overflow, idempotency, warm loading,
  false-positive prevention, and edge cases

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DelegateTool (tools/delegate.py):
- Synchronous counterpart to spawn: runs a worker agent inline and returns
  the result in the same turn. Supervisor sees worker output immediately.
- 5 worker roles with specialized system prompts: researcher, writer,
  analyst, coder, general.
- Workers have access to web_search, web_fetch, web_browse (previously
  missing from subagent tool registry), read/write/edit/exec.

SubagentManager (subagent.py):
- Extracted _execute_subagent() (run loop, return string) from
  _run_subagent() (background wrapper that announces via bus).
- run_direct(task, role) — synchronous delegate entry point.
- WebBrowseTool added to subagent tool registry.
- Role prompts in _ROLE_PROMPTS dict, keyed by role name.
- DELEGATE_MAX_ITERATIONS constant replaces hardcoded 15.

AgentLoop (loop.py) — parallel batch execution:
- Tool calls within a single LLM response now run concurrently via
  asyncio.gather (Phase 2), enabling true parallel delegation.
- Circuit breaker bookkeeping stays sequential (Phase 1) so per-tool
  limits and consecutive detection remain correct.
- Results applied to messages in original order (Phase 3).

7 new tests: delegate unit tests, parallel timing proof, circuit breaker
preservation, registration check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Split local inference into speed-optimised (enrichment/memory) and
quality-optimised (judge/supervisor) models:
- LOCAL_MODEL_DEFAULT: ollama/mistral → ollama/qwen2.5:7b (same speed, better quality)
- JUDGE_MODEL_DEFAULT: new constant → ollama/mistral-nemo (12B, ~16 tok/s)
- eval.py and daemon.py now use JUDGE_MODEL_DEFAULT
- conftest.py skips llm tests when judge model isn't pulled yet

Models need to be pulled on Windows before taking effect:
  ollama pull qwen2.5:7b
  ollama pull mistral-nemo

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mistral Nemo (12B) is stricter than Mistral 7B — required:
- TIMEOUT_JUDGE = 60s in constants (cold-start headroom for 12B)
- eval.py uses TIMEOUT_JUDGE instead of hardcoded 30s
- Two test cases made unambiguous: removed hedging language ("reliably")
  that Nemo parsed as logical loopholes, simplified criteria

All 112 tests pass (11 llm tests now run against Nemo).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Added explicit severity rules to the Nemo prompt:
- critical: only for runtime failures actively breaking the bot (crash, unrecoverable LLM error)
- warning: degraded behaviour, high token usage, recovered failures
- info: normal operational noise

Prevents architecture observations and context-loss notes from being
labelled critical when they aren't immediately actionable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…art race condition

- Record latency_ms per LLM call in USAGE.jsonl (time.monotonic around acompletion)
- Move typing indicator to top of _on_message so voice/media downloads show feedback immediately
- Fix supervisor _restart_gateway: poll until old PIDs die before spawning new gateway,
  with SIGKILL fallback after 10s — prevents dual-gateway Telegram conflict on restart

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Mbda1
Copy link
Copy Markdown
Author

Mbda1 commented Feb 27, 2026

Closing — this was opened against the wrong repo. Changes are personal-use features for a fork.

@Mbda1 Mbda1 closed this Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant