Skip to content

feat: add skill discovery — auto-extract reusable skillfeat: add skill discovery — auto-extract reusable skills from conversation historys from conver…#3039

Closed
wanghesong2019 wants to merge 2 commits intoHKUDS:mainfrom
wanghesong2019:skill-discover

Conversation

@wanghesong2019
Copy link
Copy Markdown
Contributor

@wanghesong2019 wanghesong2019 commented Apr 11, 2026

PR Description

Summary

Closes #2927
This PR introduces Skill Discovery, a feature that automatically analyzes conversation history to identify reusable behavioral patterns and extract them into standalone SKILL.md files. The system uses a two-phase LLM pipeline, supports three trigger modes, and includes a human-in-the-loop approval workflow.

Motivation

nanobot already supports manually created skills (via skills/<name>/SKILL.md), but users must recognize and codify their own repetitive workflows. Skill Discovery closes this loop by automatically detecting when the agent performs a task repeatedly and proactively suggesting it as a reusable skill — reducing manual configuration and improving the agent's self-improvement capability.

Architecture & Design Philosophy

Two-Phase LLM Pipeline

Rather than asking a single LLM call to both analyze patterns and generate skill files, the pipeline is split into two focused phases:

Phase Role Mechanism Output
Phase 1 Pattern Analysis Single LLM call with structured JSON output Array of candidate patterns with name, description, frequency, evidence, recommendation strength
Phase 2 Skill Generation AgentRunner with read_file + write_file tools SKILL.md files written to .pending_skills/ directory

Why two phases?

  • Separation of concerns: Analysis requires judgment (is this pattern worth extracting?), generation requires tool use (read existing skills for reference, write new files). Mixing both in one call degrades quality.
  • Quality gate insertion: Weak recommendations are filtered between phases, saving LLM tokens on generation.
  • Tool isolation: Phase 2's write_file is scoped to .pending_skills/ only — it cannot overwrite existing skills or arbitrary workspace files.

Incremental Cursor-Based Processing

Skill Discovery reads only unprocessed history entries using a persistent cursor (.skill_discovery_cursor):

  • After each successful run, the cursor advances to the last processed entry.
  • If Phase 1 fails, the cursor is not advanced — preserving the retry opportunity.
  • This avoids re-analyzing the same history and ensures at-least-once processing semantics.

Three Trigger Modes

Mode Mechanism Config Use Case
Manual /discover-skills command Always available when enabled On-demand analysis
Post-turn Turn counter per session interval_turns (default: 20) Automatic background discovery during active use
Cron System cron job cron expression or min_interval_s Scheduled discovery in daemon mode

Post-turn and cron triggers run discovery in the background — they never block the user's conversation.

Human-in-the-Loop Approval

By default (auto_approve: false), discovered skills are saved to .pending_skills/ and await explicit user approval:

/discover-skills          → discovers and lists candidates
/skill-approve <name>     → installs one candidate
/skill-approve all        → installs all candidates
/skill-approve            → lists pending candidates

With auto_approve: true, skills are installed immediately — suitable for trusted environments.

Processing Flow

┌─────────────────────────────────────────────────────────────────┐
│                    Trigger (manual / post-turn / cron)          │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  Read unprocessed history entries (since last cursor)           │
│  → if empty: return []                                          │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 1: LLM Pattern Analysis                                 │
│  Input: history text + context (Memory, User, existing skills)  │
│  Output: JSON array of candidate patterns                       │
│  → if fails: return [] (cursor NOT advanced)                    │
│  → if empty/weak-only: advance cursor, return []                │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 2: AgentRunner Skill Generation                          │
│  Tools: read_file (workspace) + write_file (.pending_skills/)   │
│  → generates SKILL.md files into .pending_skills/<name>/        │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  Quality Gates                                                  │
│  - Name conflict: skip if skill already exists in skills/       │
│  - Deduplication: skip duplicate candidate names                │
│  - max_candidates cap (default: 5)                              │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  Advance cursor → return candidates                             │
│                                                                 │
│  Manual trigger:                                                │
│    auto_approve=true  → approve_all() → install to skills/      │
│    auto_approve=false → save_pending() → await /skill-approve   │
│  Post-turn / Cron:                                              │
│    → save_pending() (always, for later approval)                │
└─────────────────────────────────────────────────────────────────┘

Files Changed

File Change
nanobot/agent/skill_discovery.py New — Core SkillDiscoverer class with two-phase pipeline, candidate filtering, pending management, approval, and post-turn trigger logic
nanobot/agent/loop.py Modified — Import SkillDiscoverer, add skill_discoverer attribute, call _maybe_trigger_skill_discovery() after each turn
nanobot/agent/memory.py Modified — Add get_skill_discovery_cursor() / set_skill_discovery_cursor() for persistent cursor tracking
nanobot/command/builtin.py Modified — Add cmd_discover_skills and cmd_skill_approve command handlers, register /discover-skills and /skill-approve routes
nanobot/config/schema.py Modified — Add SkillDiscoveryConfig with enabled, model_override, max_history_entries, max_candidates, auto_approve, interval_turns, min_interval_s, cron fields
nanobot/cli/commands.py Modified — Initialize SkillDiscoverer in both daemon and CLI modes; register cron job for scheduled discovery; handle skill-discovery system_event in cron handler
nanobot/templates/agent/skill_discovery_phase1.md New — Phase 1 prompt template (pattern analysis with structured JSON output)
nanobot/templates/agent/skill_discovery_phase2.md New — Phase 2 prompt template (skill generation with quality standards)
tests/agent/test_skill_discovery.py New — 64 test cases across 15 test classes

Configuration Example

skillDiscovery:
  enabled: true
  modelOverride: null          # uses default agent model if not set
  maxHistoryEntries: 50        # max history entries per analysis
  maxCandidates: 5             # max candidates per discovery run
  autoApprove: false           # require /skill-approve confirmation
  intervalTurns: 20            # auto-trigger every 20 turns
  minIntervalS: 7200           # minimum 2h between auto-triggers
  cron: null                   # optional: "0 */4 * * *" for every 4 hours

Test Coverage

64 test cases across 15 test classes, covering all core components and trigger modes:

Unit Tests — Core Components

Class Tests Coverage
TestSkillCandidate 1 format_preview() output format
TestParsePhase1Analysis 5 JSON in code block, bare JSON array, empty array, invalid JSON, JSON object rejection
TestExtractDescription 4 First non-heading line, skip empty lines, truncation, fallback
TestFilterCandidates 3 Duplicate names, existing skill conflict, max_candidates cap
TestPendingManagement 4 Save & list, remove, clear, empty directory
TestApproval 3 Install single, install all, git auto-commit
TestPostTurnTrigger 5 Bump counter, trigger at interval, counter reset, zero-interval disable, independent sessions

Integration Tests — Discover Flow

Class Tests Coverage
TestDiscover 4 No new history, Phase 1 failure (cursor not advanced), no patterns (cursor advanced), weak filtering
TestBuildSchedule 6 Default every-schedule, cron override, timezone, min_interval_s, null cron, empty cron
TestCronJobRegistration 3 CronJob field validation, no job when cron unset, every-schedule job
TestCronDiscoveryExecution 4 Discover & save pending, no candidates, auto-approve install, Phase 1 failure (cursor preserved)

End-to-End Tests — Command & Trigger Modes

Class Tests Coverage
TestManualTrigger 5 Disabled state, immediate ack, no candidates, pending save (autoApprove=false), direct install (autoApprove=true)
TestSkillApproveCommand 6 Disabled, no pending, approve by name, approve all, nonexistent name, no-args listing
TestPostTurnTrigger (loop) 7 Discoverer None, config disabled, trigger at interval, no trigger before interval, counter reset, independent sessions, discover failure tolerance
TestCronTriggerHandler 4 Discover & save flow, no discoverer skip, failure caught, empty result no-op

Key Test Scenarios

  • Cursor safety: Phase 1 failure does not advance cursor (preserves retry); successful empty-result runs do advance cursor (avoids re-processing)
  • Tool isolation: Phase 2 write_file is scoped to .pending_skills/ only — cannot write outside
  • Name conflict prevention: Candidates matching existing skill names are silently skipped
  • Weak recommendation filtering: Patterns marked "weak" are dropped between Phase 1 and Phase 2
  • Per-session turn counters: Post-turn triggers are isolated per session key; one session hitting the threshold does not affect another
  • Async non-blocking: Manual trigger returns "Discovering skills..." immediately; actual discovery runs as a background asyncio.Task

Edge Cases Handled

  1. No unprocessed history → returns empty list, no LLM call
  2. Phase 1 LLM failure → returns empty, cursor not advanced (retry on next trigger)
  3. Phase 1 returns non-JSON_parse_phase1_analysis falls back gracefully, returns []
  4. All patterns are "weak" → filtered out, cursor advanced, no Phase 2
  5. Phase 2 AgentRunner failure → candidates collected from pending dir (partial results), cursor still advanced
  6. Duplicate candidate names → first occurrence kept, rest deduplicated
  7. Candidate name conflicts with existing skill → silently skipped
  8. Pending directory doesn't existlist_pending() returns []
  9. /skill-approve with nonexistent name → shows available pending skill names
  10. /skill-approve with no args → lists pending skills with usage hint

@Re-bin
Copy link
Copy Markdown
Collaborator

Re-bin commented Apr 11, 2026

Nice feature, gonna review it soon.

@wanghesong2019
Copy link
Copy Markdown
Contributor Author

Thanks, looking forward to your review! Happy to address any feedback.

@chengyongru
Copy link
Copy Markdown
Collaborator

Skill discovery is a good idea, but this PR looks a little bit over-engineered.

chengyongru added a commit that referenced this pull request Apr 11, 2026
Instead of a separate skill discovery system, extend Dream's two-phase
pipeline to also detect reusable behavioral patterns from conversation
history and generate SKILL.md files.

Phase 1 gains a [SKILL] output type for pattern detection.
Phase 2 gains write_file (scoped to skills/) and read access to builtin
skills, enabling it to check for duplicates and follow skill-creator's
format conventions before creating new skills.

Inspired by PR #3039 by whs.

Co-authored-by: whs <whs@xdd.ai>
chengyongru added a commit that referenced this pull request Apr 11, 2026
Instead of a separate skill discovery system, extend Dream's two-phase
pipeline to also detect reusable behavioral patterns from conversation
history and generate SKILL.md files.

Phase 1 gains a [SKILL] output type for pattern detection.
Phase 2 gains write_file (scoped to skills/) and read access to builtin
skills, enabling it to check for duplicates and follow skill-creator's
format conventions before creating new skills.

Inspired by PR #3039 by @wanghesong2019.

Co-authored-by: wanghesong2019 <wanghesong2019@users.noreply.github.com>
@chengyongru
Copy link
Copy Markdown
Collaborator

@wanghesong2019 How about this? see: #3048

f"- **{c.name}** → `{p}`" for c, p in zip(pending, paths)
)
elif args:
matched = [c for c in pending if c.name == args]
Copy link
Copy Markdown

@jeremyjh jeremyjh Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to include a fuzzy match on the description. The generator is not deterministic and there are always multiple candidate names. This isn't theoretical: I've seen Hermes generate 6 different skills for the same thing with slightly different names. The problem there was mostly it was re-reading the whole session every 10 turns so it was literally discovering the same thing over and over, but the names were all different despite being descriptive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyjh That's a really valid concern, and the "6 different names for the same skill" scenario is exactly the kind of real-world failure mode that's easy to miss in design.

Worth noting that #3048 already addresses the "re-reading the whole session every N turns" part — it uses a cursor-based incremental approach (read_unprocessed_history(since_cursor=last_cursor)), so Dream only processes new entries since the last run. That eliminates the repeated-discovery loop you described.

The fuzzy description matching is a separate gap though, and you're right that it's not covered in either PR. #3048's
Phase 2 does inject the existing skills list and has read_file access to check for semantic overlap, but that relies
entirely on the LLM's judgment — there's no hard dedup guard. A lightweight similarity check on descriptions (e.g. edit distance or embedding cosine similarity) before writing would make this much more robust.

I'd suggest raising this directly on #3048 — it's a concrete improvement that fits naturally into that approach, and the maintainer @chengyongru there would be the right person to decide how to handle it.

@wanghesong2019
Copy link
Copy Markdown
Contributor Author

@chengyongru Hi, thanks for the thoughtful feedback and for building #3048 — I think you're right that #3039 is over-engineered for what this feature needs to be at this stage.

After reviewing #3048's implementation, I think the "extend Dream" approach is genuinely better as a starting point:

• The insight that "a Skill is just another kind of memory" is elegant. Dream already does the hard work of analyzing
history and extracting structured knowledge — adding [SKILL] as a third output type is a natural extension, not a
bolt-on.
• Scoping write_file to skills/ in Phase 2, and having the agent read_file existing skills for dedup, is smarter than
my code-level name matching — it catches semantic duplicates, not just name collisions.
• Zero new config fields and zero new commands means zero new cognitive load for users. That matters.

The main thing #3039 has that #3048 doesn't is a user-facing approval step — skills are written directly to skills/
without a pending/review buffer. I'm curious whether you see that as a concern, or whether the git commit trail is
considered sufficient for auditability?

If #3048 gets merged, I'd be happy to follow up with a small incremental PR adding a /discover-skills command that
manually triggers the Dream skill-discovery phase — for users who want on-demand control without waiting for the
automatic Dream cycle. That could be a clean addition on top of #3048's foundation.

Happy to close #3039 in favor of #3048 if that's the direction. Thanks again for the cleaner solution.

Re-bin pushed a commit that referenced this pull request Apr 12, 2026
Instead of a separate skill discovery system, extend Dream's two-phase
pipeline to also detect reusable behavioral patterns from conversation
history and generate SKILL.md files.

Phase 1 gains a [SKILL] output type for pattern detection.
Phase 2 gains write_file (scoped to skills/) and read access to builtin
skills, enabling it to check for duplicates and follow skill-creator's
format conventions before creating new skills.

Inspired by PR #3039 by @wanghesong2019.

Co-authored-by: wanghesong2019 <wanghesong2019@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature Request: Automatic Skill Discovery and Generation

4 participants