feat: add skill discovery — auto-extract reusable skillfeat: add skill discovery — auto-extract reusable skills from conversation historys from conver… by wanghesong2019 · Pull Request #3039 · HKUDS/nanobot

wanghesong2019 · 2026-04-11T07:50:33Z

PR Description

Summary

Closes #2927
This PR introduces Skill Discovery, a feature that automatically analyzes conversation history to identify reusable behavioral patterns and extract them into standalone SKILL.md files. The system uses a two-phase LLM pipeline, supports three trigger modes, and includes a human-in-the-loop approval workflow.

Motivation

nanobot already supports manually created skills (via skills/<name>/SKILL.md), but users must recognize and codify their own repetitive workflows. Skill Discovery closes this loop by automatically detecting when the agent performs a task repeatedly and proactively suggesting it as a reusable skill — reducing manual configuration and improving the agent's self-improvement capability.

Architecture & Design Philosophy

Two-Phase LLM Pipeline

Rather than asking a single LLM call to both analyze patterns and generate skill files, the pipeline is split into two focused phases:

Phase	Role	Mechanism	Output
Phase 1	Pattern Analysis	Single LLM call with structured JSON output	Array of candidate patterns with name, description, frequency, evidence, recommendation strength
Phase 2	Skill Generation	AgentRunner with `read_file` + `write_file` tools	`SKILL.md` files written to `.pending_skills/` directory

Why two phases?

Separation of concerns: Analysis requires judgment (is this pattern worth extracting?), generation requires tool use (read existing skills for reference, write new files). Mixing both in one call degrades quality.
Quality gate insertion: Weak recommendations are filtered between phases, saving LLM tokens on generation.
Tool isolation: Phase 2's write_file is scoped to .pending_skills/ only — it cannot overwrite existing skills or arbitrary workspace files.

Incremental Cursor-Based Processing

Skill Discovery reads only unprocessed history entries using a persistent cursor (.skill_discovery_cursor):

After each successful run, the cursor advances to the last processed entry.
If Phase 1 fails, the cursor is not advanced — preserving the retry opportunity.
This avoids re-analyzing the same history and ensures at-least-once processing semantics.

Three Trigger Modes

Mode	Mechanism	Config	Use Case
Manual	`/discover-skills` command	Always available when enabled	On-demand analysis
Post-turn	Turn counter per session	`interval_turns` (default: 20)	Automatic background discovery during active use
Cron	System cron job	`cron` expression or `min_interval_s`	Scheduled discovery in daemon mode

Post-turn and cron triggers run discovery in the background — they never block the user's conversation.

Human-in-the-Loop Approval

By default (auto_approve: false), discovered skills are saved to .pending_skills/ and await explicit user approval:

/discover-skills          → discovers and lists candidates
/skill-approve <name>     → installs one candidate
/skill-approve all        → installs all candidates
/skill-approve            → lists pending candidates

With auto_approve: true, skills are installed immediately — suitable for trusted environments.

Processing Flow

┌─────────────────────────────────────────────────────────────────┐
│                    Trigger (manual / post-turn / cron)          │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  Read unprocessed history entries (since last cursor)           │
│  → if empty: return []                                          │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 1: LLM Pattern Analysis                                 │
│  Input: history text + context (Memory, User, existing skills)  │
│  Output: JSON array of candidate patterns                       │
│  → if fails: return [] (cursor NOT advanced)                    │
│  → if empty/weak-only: advance cursor, return []                │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  Phase 2: AgentRunner Skill Generation                          │
│  Tools: read_file (workspace) + write_file (.pending_skills/)   │
│  → generates SKILL.md files into .pending_skills/<name>/        │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  Quality Gates                                                  │
│  - Name conflict: skip if skill already exists in skills/       │
│  - Deduplication: skip duplicate candidate names                │
│  - max_candidates cap (default: 5)                              │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  Advance cursor → return candidates                             │
│                                                                 │
│  Manual trigger:                                                │
│    auto_approve=true  → approve_all() → install to skills/      │
│    auto_approve=false → save_pending() → await /skill-approve   │
│  Post-turn / Cron:                                              │
│    → save_pending() (always, for later approval)                │
└─────────────────────────────────────────────────────────────────┘

Files Changed

File	Change
`nanobot/agent/skill_discovery.py`	New — Core `SkillDiscoverer` class with two-phase pipeline, candidate filtering, pending management, approval, and post-turn trigger logic
`nanobot/agent/loop.py`	Modified — Import `SkillDiscoverer`, add `skill_discoverer` attribute, call `_maybe_trigger_skill_discovery()` after each turn
`nanobot/agent/memory.py`	Modified — Add `get_skill_discovery_cursor()` / `set_skill_discovery_cursor()` for persistent cursor tracking
`nanobot/command/builtin.py`	Modified — Add `cmd_discover_skills` and `cmd_skill_approve` command handlers, register `/discover-skills` and `/skill-approve` routes
`nanobot/config/schema.py`	Modified — Add `SkillDiscoveryConfig` with `enabled`, `model_override`, `max_history_entries`, `max_candidates`, `auto_approve`, `interval_turns`, `min_interval_s`, `cron` fields
`nanobot/cli/commands.py`	Modified — Initialize `SkillDiscoverer` in both daemon and CLI modes; register cron job for scheduled discovery; handle `skill-discovery` system_event in cron handler
`nanobot/templates/agent/skill_discovery_phase1.md`	New — Phase 1 prompt template (pattern analysis with structured JSON output)
`nanobot/templates/agent/skill_discovery_phase2.md`	New — Phase 2 prompt template (skill generation with quality standards)
`tests/agent/test_skill_discovery.py`	New — 64 test cases across 15 test classes

Configuration Example

skillDiscovery:
  enabled: true
  modelOverride: null          # uses default agent model if not set
  maxHistoryEntries: 50        # max history entries per analysis
  maxCandidates: 5             # max candidates per discovery run
  autoApprove: false           # require /skill-approve confirmation
  intervalTurns: 20            # auto-trigger every 20 turns
  minIntervalS: 7200           # minimum 2h between auto-triggers
  cron: null                   # optional: "0 */4 * * *" for every 4 hours

Test Coverage

64 test cases across 15 test classes, covering all core components and trigger modes:

Unit Tests — Core Components

Class	Tests	Coverage
`TestSkillCandidate`	1	`format_preview()` output format
`TestParsePhase1Analysis`	5	JSON in code block, bare JSON array, empty array, invalid JSON, JSON object rejection
`TestExtractDescription`	4	First non-heading line, skip empty lines, truncation, fallback
`TestFilterCandidates`	3	Duplicate names, existing skill conflict, max_candidates cap
`TestPendingManagement`	4	Save & list, remove, clear, empty directory
`TestApproval`	3	Install single, install all, git auto-commit
`TestPostTurnTrigger`	5	Bump counter, trigger at interval, counter reset, zero-interval disable, independent sessions

Integration Tests — Discover Flow

Class	Tests	Coverage
`TestDiscover`	4	No new history, Phase 1 failure (cursor not advanced), no patterns (cursor advanced), weak filtering
`TestBuildSchedule`	6	Default every-schedule, cron override, timezone, min_interval_s, null cron, empty cron
`TestCronJobRegistration`	3	CronJob field validation, no job when cron unset, every-schedule job
`TestCronDiscoveryExecution`	4	Discover & save pending, no candidates, auto-approve install, Phase 1 failure (cursor preserved)

End-to-End Tests — Command & Trigger Modes

Class	Tests	Coverage
`TestManualTrigger`	5	Disabled state, immediate ack, no candidates, pending save (autoApprove=false), direct install (autoApprove=true)
`TestSkillApproveCommand`	6	Disabled, no pending, approve by name, approve all, nonexistent name, no-args listing
`TestPostTurnTrigger` (loop)	7	Discoverer None, config disabled, trigger at interval, no trigger before interval, counter reset, independent sessions, discover failure tolerance
`TestCronTriggerHandler`	4	Discover & save flow, no discoverer skip, failure caught, empty result no-op

Key Test Scenarios

Cursor safety: Phase 1 failure does not advance cursor (preserves retry); successful empty-result runs do advance cursor (avoids re-processing)
Tool isolation: Phase 2 write_file is scoped to .pending_skills/ only — cannot write outside
Name conflict prevention: Candidates matching existing skill names are silently skipped
Weak recommendation filtering: Patterns marked "weak" are dropped between Phase 1 and Phase 2
Per-session turn counters: Post-turn triggers are isolated per session key; one session hitting the threshold does not affect another
Async non-blocking: Manual trigger returns "Discovering skills..." immediately; actual discovery runs as a background asyncio.Task

Edge Cases Handled

No unprocessed history → returns empty list, no LLM call
Phase 1 LLM failure → returns empty, cursor not advanced (retry on next trigger)
Phase 1 returns non-JSON → _parse_phase1_analysis falls back gracefully, returns []
All patterns are "weak" → filtered out, cursor advanced, no Phase 2
Phase 2 AgentRunner failure → candidates collected from pending dir (partial results), cursor still advanced
Duplicate candidate names → first occurrence kept, rest deduplicated
Candidate name conflicts with existing skill → silently skipped
Pending directory doesn't exist → list_pending() returns []
/skill-approve with nonexistent name → shows available pending skill names
/skill-approve with no args → lists pending skills with usage hint

Re-bin · 2026-04-11T08:14:16Z

Nice feature, gonna review it soon.

wanghesong2019 · 2026-04-11T08:37:28Z

Thanks, looking forward to your review! Happy to address any feedback.

chengyongru · 2026-04-11T16:25:50Z

Skill discovery is a good idea, but this PR looks a little bit over-engineered.

Instead of a separate skill discovery system, extend Dream's two-phase pipeline to also detect reusable behavioral patterns from conversation history and generate SKILL.md files. Phase 1 gains a [SKILL] output type for pattern detection. Phase 2 gains write_file (scoped to skills/) and read access to builtin skills, enabling it to check for duplicates and follow skill-creator's format conventions before creating new skills. Inspired by PR #3039 by whs. Co-authored-by: whs <whs@xdd.ai>

@wanghesong2019

Instead of a separate skill discovery system, extend Dream's two-phase pipeline to also detect reusable behavioral patterns from conversation history and generate SKILL.md files. Phase 1 gains a [SKILL] output type for pattern detection. Phase 2 gains write_file (scoped to skills/) and read access to builtin skills, enabling it to check for duplicates and follow skill-creator's format conventions before creating new skills. Inspired by PR #3039 by @wanghesong2019. Co-authored-by: wanghesong2019 <wanghesong2019@users.noreply.github.com>

chengyongru · 2026-04-11T16:50:09Z

@wanghesong2019 How about this? see: #3048

jeremyjh · 2026-04-11T17:01:59Z

+            f"- **{c.name}** → `{p}`" for c, p in zip(pending, paths)
+        )
+    elif args:
+        matched = [c for c in pending if c.name == args]


I think this needs to include a fuzzy match on the description. The generator is not deterministic and there are always multiple candidate names. This isn't theoretical: I've seen Hermes generate 6 different skills for the same thing with slightly different names. The problem there was mostly it was re-reading the whole session every 10 turns so it was literally discovering the same thing over and over, but the names were all different despite being descriptive.

@jeremyjh That's a really valid concern, and the "6 different names for the same skill" scenario is exactly the kind of real-world failure mode that's easy to miss in design.

Worth noting that #3048 already addresses the "re-reading the whole session every N turns" part — it uses a cursor-based incremental approach (read_unprocessed_history(since_cursor=last_cursor)), so Dream only processes new entries since the last run. That eliminates the repeated-discovery loop you described.

The fuzzy description matching is a separate gap though, and you're right that it's not covered in either PR. #3048's
Phase 2 does inject the existing skills list and has read_file access to check for semantic overlap, but that relies
entirely on the LLM's judgment — there's no hard dedup guard. A lightweight similarity check on descriptions (e.g. edit distance or embedding cosine similarity) before writing would make this much more robust.

I'd suggest raising this directly on #3048 — it's a concrete improvement that fits naturally into that approach, and the maintainer @chengyongru there would be the right person to decide how to handle it.

wanghesong2019 · 2026-04-12T06:57:06Z

@chengyongru Hi, thanks for the thoughtful feedback and for building #3048 — I think you're right that #3039 is over-engineered for what this feature needs to be at this stage.

After reviewing #3048's implementation, I think the "extend Dream" approach is genuinely better as a starting point:

• The insight that "a Skill is just another kind of memory" is elegant. Dream already does the hard work of analyzing
history and extracting structured knowledge — adding [SKILL] as a third output type is a natural extension, not a
bolt-on.
• Scoping write_file to skills/ in Phase 2, and having the agent read_file existing skills for dedup, is smarter than
my code-level name matching — it catches semantic duplicates, not just name collisions.
• Zero new config fields and zero new commands means zero new cognitive load for users. That matters.

The main thing #3039 has that #3048 doesn't is a user-facing approval step — skills are written directly to skills/
without a pending/review buffer. I'm curious whether you see that as a concern, or whether the git commit trail is
considered sufficient for auditability?

If #3048 gets merged, I'd be happy to follow up with a small incremental PR adding a /discover-skills command that
manually triggers the Dream skill-discovery phase — for users who want on-demand control without waiting for the
automatic Dream cycle. That could be a clean addition on top of #3048's foundation.

Happy to close #3039 in favor of #3048 if that's the direction. Thanks again for the cleaner solution.

@wanghesong2019

Instead of a separate skill discovery system, extend Dream's two-phase pipeline to also detect reusable behavioral patterns from conversation history and generate SKILL.md files. Phase 1 gains a [SKILL] output type for pattern detection. Phase 2 gains write_file (scoped to skills/) and read access to builtin skills, enabling it to check for duplicates and follow skill-creator's format conventions before creating new skills. Inspired by PR #3039 by @wanghesong2019. Co-authored-by: wanghesong2019 <wanghesong2019@users.noreply.github.com>

wanghesong2019 mentioned this pull request Apr 11, 2026

feature Request: Automatic Skill Discovery and Generation #2927

Closed

5 tasks

whs added 2 commits April 11, 2026 22:33

Resolving conflicts of loop.py Inconsistent import statements

1aa6fee

fix: remove unused imports ield and Any from skill_discovery.py

2041ddf

wanghesong2019 force-pushed the skill-discover branch from 1c9936f to 2041ddf Compare April 11, 2026 14:36

chengyongru mentioned this pull request Apr 11, 2026

feat(agent): integrate skill discovery into Dream #3048

Merged

jeremyjh reviewed Apr 11, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 12, 2026

🦞 OpenClaw 生态日报 2026-04-12 gsscsd/big_model_radar#173

Open

wanghesong2019 closed this Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add skill discovery — auto-extract reusable skillfeat: add skill discovery — auto-extract reusable skills from conversation historys from conver…#3039

feat: add skill discovery — auto-extract reusable skillfeat: add skill discovery — auto-extract reusable skills from conversation historys from conver…#3039
wanghesong2019 wants to merge 2 commits intoHKUDS:mainfrom
wanghesong2019:skill-discover

wanghesong2019 commented Apr 11, 2026 •

edited

Loading

Uh oh!

Re-bin commented Apr 11, 2026

Uh oh!

wanghesong2019 commented Apr 11, 2026

Uh oh!

chengyongru commented Apr 11, 2026

Uh oh!

chengyongru commented Apr 11, 2026

Uh oh!

jeremyjh Apr 11, 2026 •

edited

Loading

Uh oh!

wanghesong2019 Apr 12, 2026

Uh oh!

wanghesong2019 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wanghesong2019 commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

Summary

Motivation

Architecture & Design Philosophy

Two-Phase LLM Pipeline

Incremental Cursor-Based Processing

Three Trigger Modes

Human-in-the-Loop Approval

Processing Flow

Files Changed

Configuration Example

Test Coverage

Unit Tests — Core Components

Integration Tests — Discover Flow

End-to-End Tests — Command & Trigger Modes

Key Test Scenarios

Edge Cases Handled

Uh oh!

Re-bin commented Apr 11, 2026

Uh oh!

wanghesong2019 commented Apr 11, 2026

Uh oh!

chengyongru commented Apr 11, 2026

Uh oh!

chengyongru commented Apr 11, 2026

Uh oh!

jeremyjh Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wanghesong2019 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

wanghesong2019 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wanghesong2019 commented Apr 11, 2026 •

edited

Loading

jeremyjh Apr 11, 2026 •

edited

Loading