Skip to content

feat: GEPA integration for automated skill optimization#7

Merged
spboyer merged 2 commits intomainfrom
feat/gepa-integration
Mar 25, 2026
Merged

feat: GEPA integration for automated skill optimization#7
spboyer merged 2 commits intomainfrom
feat/gepa-integration

Conversation

@spboyer
Copy link
Copy Markdown
Owner

@spboyer spboyer commented Mar 25, 2026

Summary

Adds GEPA (Genetic-Pareto) evolutionary optimization as an optional enhancement to sensei's Ralph loop.

What GEPA does for sensei

When invoked with --gepa, sensei replaces its template-based improve step with LLM-driven evolutionary optimization. GEPA:

  1. Auto-discovers the skill's test harness (triggers.test.ts, unit.test.ts) at runtime
  2. Builds an evaluator that scores candidates on content quality + trigger accuracy
  3. Proposes improvements via LLM, keeping only versions that score higher
  4. Feeds test failures as ASI (Actionable Side Information) so the LLM knows why a candidate failed

Results on GitHub Copilot for Azure skills

Tested on 4 skills with 0% invocation rates:

Skill Quality Before Quality After
azure-storage 0.16 1.00
entra-app-registration 0.38 1.00
microsoft-foundry 0.50 1.00
azure-deploy 0.62 1.00

Full baseline of all 23 skills: 0/23 pass quality threshold — every skill has room for improvement.

Key design decisions

  • Existing tests are NOT replaced — GEPA wraps them as its fitness function
  • Zero user config — evaluators auto-generated from test files at runtime
  • Opt-in--gepa flag; default sensei behavior unchanged
  • Uses GitHub Models — free LLM via gh auth token, no API keys needed

Files changed

  • SKILL.md — Added GEPA mode invocation docs + Step 5-GEPA in Ralph loop
  • scripts/src/gepa/auto_evaluator.py — Auto-evaluator CLI (score, score-all, optimize)

Usage

# Score all skills (instant, no LLM)
python scripts/src/gepa/auto_evaluator.py score-all --skills-dir plugin/skills --tests-dir tests

# Optimize a skill
python scripts/src/gepa/auto_evaluator.py optimize --skill azure-storage --skills-dir plugin/skills --tests-dir tests

References

Add GEPA (Genetic-Pareto) evolutionary optimization as an optional
enhancement to sensei's Ralph loop. When invoked with --gepa flag,
replaces template-based improvement with LLM-driven optimization
that uses existing test harness as its fitness function.

Key additions:
- scripts/src/gepa/auto_evaluator.py: Auto-discovers test files
  (triggers.test.ts, unit.test.ts) at runtime and builds GEPA
  evaluators dynamically. Zero manual configuration required.
- SKILL.md: Added GEPA mode invocation docs and Step 5-GEPA
  instructions for the Ralph loop.

The auto-evaluator provides three commands:
- score: Evaluate SKILL.md quality (no LLM, instant)
- score-all: Baseline all skills in a project
- optimize: Run GEPA optimization using GitHub Models

Existing tests are NOT replaced — GEPA wraps them as its fitness
function and feeds test failures as Actionable Side Information
to guide the LLM proposer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes from consolidated review (Opus 4.6, GPT-5.4, Haiku):

Code fixes (auto_evaluator.py):
- Remove DO NOT USE FOR from scorer/optimizer (contradicts sensei scoring)
- Parse frontmatter description for scoring (was only scoring body)
- Fix regex to handle apostrophes and backtick template literals
- Handle malformed YAML frontmatter gracefully (no more ValueError crash)
- Change default paths from plugin/skills to skills/
- Return non-zero exit codes on errors (CI reliability)
- Validate gh auth token output before using as API key
- Align optimizer objective with scored sections
- Fix docstring to accurately reflect fitness function scope
- Seed GEPA with full SKILL.md content (frontmatter + body)

Docs:
- Add GEPA to README.md (Quick Start, Flags, Prerequisites, Commands)
- Add GEPA to AGENTS.md (repo structure, testing, dependencies)
- Fix SKILL.md command examples to use correct default paths
- Add requirements.txt for Python dependencies

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer merged commit b814e3b into main Mar 25, 2026
2 checks passed
@spboyer spboyer deleted the feat/gepa-integration branch March 25, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant