feat: add skill discovery — auto-extract reusable skillfeat: add skill discovery — auto-extract reusable skills from conversation historys from conver…#3039
Conversation
|
Nice feature, gonna review it soon. |
|
Thanks, looking forward to your review! Happy to address any feedback. |
1c9936f to
2041ddf
Compare
|
Skill discovery is a good idea, but this PR looks a little bit over-engineered. |
Instead of a separate skill discovery system, extend Dream's two-phase pipeline to also detect reusable behavioral patterns from conversation history and generate SKILL.md files. Phase 1 gains a [SKILL] output type for pattern detection. Phase 2 gains write_file (scoped to skills/) and read access to builtin skills, enabling it to check for duplicates and follow skill-creator's format conventions before creating new skills. Inspired by PR #3039 by whs. Co-authored-by: whs <whs@xdd.ai>
Instead of a separate skill discovery system, extend Dream's two-phase pipeline to also detect reusable behavioral patterns from conversation history and generate SKILL.md files. Phase 1 gains a [SKILL] output type for pattern detection. Phase 2 gains write_file (scoped to skills/) and read access to builtin skills, enabling it to check for duplicates and follow skill-creator's format conventions before creating new skills. Inspired by PR #3039 by @wanghesong2019. Co-authored-by: wanghesong2019 <wanghesong2019@users.noreply.github.com>
|
@wanghesong2019 How about this? see: #3048 |
| f"- **{c.name}** → `{p}`" for c, p in zip(pending, paths) | ||
| ) | ||
| elif args: | ||
| matched = [c for c in pending if c.name == args] |
There was a problem hiding this comment.
I think this needs to include a fuzzy match on the description. The generator is not deterministic and there are always multiple candidate names. This isn't theoretical: I've seen Hermes generate 6 different skills for the same thing with slightly different names. The problem there was mostly it was re-reading the whole session every 10 turns so it was literally discovering the same thing over and over, but the names were all different despite being descriptive.
There was a problem hiding this comment.
@jeremyjh That's a really valid concern, and the "6 different names for the same skill" scenario is exactly the kind of real-world failure mode that's easy to miss in design.
Worth noting that #3048 already addresses the "re-reading the whole session every N turns" part — it uses a cursor-based incremental approach (read_unprocessed_history(since_cursor=last_cursor)), so Dream only processes new entries since the last run. That eliminates the repeated-discovery loop you described.
The fuzzy description matching is a separate gap though, and you're right that it's not covered in either PR. #3048's
Phase 2 does inject the existing skills list and has read_file access to check for semantic overlap, but that relies
entirely on the LLM's judgment — there's no hard dedup guard. A lightweight similarity check on descriptions (e.g. edit distance or embedding cosine similarity) before writing would make this much more robust.
I'd suggest raising this directly on #3048 — it's a concrete improvement that fits naturally into that approach, and the maintainer @chengyongru there would be the right person to decide how to handle it.
|
@chengyongru Hi, thanks for the thoughtful feedback and for building #3048 — I think you're right that #3039 is over-engineered for what this feature needs to be at this stage. After reviewing #3048's implementation, I think the "extend Dream" approach is genuinely better as a starting point: • The insight that "a Skill is just another kind of memory" is elegant. Dream already does the hard work of analyzing The main thing #3039 has that #3048 doesn't is a user-facing approval step — skills are written directly to skills/ If #3048 gets merged, I'd be happy to follow up with a small incremental PR adding a /discover-skills command that Happy to close #3039 in favor of #3048 if that's the direction. Thanks again for the cleaner solution. |
Instead of a separate skill discovery system, extend Dream's two-phase pipeline to also detect reusable behavioral patterns from conversation history and generate SKILL.md files. Phase 1 gains a [SKILL] output type for pattern detection. Phase 2 gains write_file (scoped to skills/) and read access to builtin skills, enabling it to check for duplicates and follow skill-creator's format conventions before creating new skills. Inspired by PR #3039 by @wanghesong2019. Co-authored-by: wanghesong2019 <wanghesong2019@users.noreply.github.com>
PR Description
Summary
Closes #2927
This PR introduces Skill Discovery, a feature that automatically analyzes conversation history to identify reusable behavioral patterns and extract them into standalone
SKILL.mdfiles. The system uses a two-phase LLM pipeline, supports three trigger modes, and includes a human-in-the-loop approval workflow.Motivation
nanobot already supports manually created skills (via
skills/<name>/SKILL.md), but users must recognize and codify their own repetitive workflows. Skill Discovery closes this loop by automatically detecting when the agent performs a task repeatedly and proactively suggesting it as a reusable skill — reducing manual configuration and improving the agent's self-improvement capability.Architecture & Design Philosophy
Two-Phase LLM Pipeline
Rather than asking a single LLM call to both analyze patterns and generate skill files, the pipeline is split into two focused phases:
read_file+write_filetoolsSKILL.mdfiles written to.pending_skills/directoryWhy two phases?
write_fileis scoped to.pending_skills/only — it cannot overwrite existing skills or arbitrary workspace files.Incremental Cursor-Based Processing
Skill Discovery reads only unprocessed history entries using a persistent cursor (
.skill_discovery_cursor):Three Trigger Modes
/discover-skillscommandinterval_turns(default: 20)cronexpression ormin_interval_sPost-turn and cron triggers run discovery in the background — they never block the user's conversation.
Human-in-the-Loop Approval
By default (
auto_approve: false), discovered skills are saved to.pending_skills/and await explicit user approval:With
auto_approve: true, skills are installed immediately — suitable for trusted environments.Processing Flow
Files Changed
nanobot/agent/skill_discovery.pySkillDiscovererclass with two-phase pipeline, candidate filtering, pending management, approval, and post-turn trigger logicnanobot/agent/loop.pySkillDiscoverer, addskill_discovererattribute, call_maybe_trigger_skill_discovery()after each turnnanobot/agent/memory.pyget_skill_discovery_cursor()/set_skill_discovery_cursor()for persistent cursor trackingnanobot/command/builtin.pycmd_discover_skillsandcmd_skill_approvecommand handlers, register/discover-skillsand/skill-approveroutesnanobot/config/schema.pySkillDiscoveryConfigwithenabled,model_override,max_history_entries,max_candidates,auto_approve,interval_turns,min_interval_s,cronfieldsnanobot/cli/commands.pySkillDiscovererin both daemon and CLI modes; register cron job for scheduled discovery; handleskill-discoverysystem_event in cron handlernanobot/templates/agent/skill_discovery_phase1.mdnanobot/templates/agent/skill_discovery_phase2.mdtests/agent/test_skill_discovery.pyConfiguration Example
Test Coverage
64 test cases across 15 test classes, covering all core components and trigger modes:
Unit Tests — Core Components
TestSkillCandidateformat_preview()output formatTestParsePhase1AnalysisTestExtractDescriptionTestFilterCandidatesTestPendingManagementTestApprovalTestPostTurnTriggerIntegration Tests — Discover Flow
TestDiscoverTestBuildScheduleTestCronJobRegistrationTestCronDiscoveryExecutionEnd-to-End Tests — Command & Trigger Modes
TestManualTriggerTestSkillApproveCommandTestPostTurnTrigger(loop)TestCronTriggerHandlerKey Test Scenarios
write_fileis scoped to.pending_skills/only — cannot write outside"weak"are dropped between Phase 1 and Phase 2asyncio.TaskEdge Cases Handled
_parse_phase1_analysisfalls back gracefully, returns[]list_pending()returns[]/skill-approvewith nonexistent name → shows available pending skill names/skill-approvewith no args → lists pending skills with usage hint