feat: GEPA integration for automated skill optimization by spboyer · Pull Request #7 · spboyer/sensei

spboyer · 2026-03-25T16:55:04Z

Summary

Adds GEPA (Genetic-Pareto) evolutionary optimization as an optional enhancement to sensei's Ralph loop.

What GEPA does for sensei

When invoked with --gepa, sensei replaces its template-based improve step with LLM-driven evolutionary optimization. GEPA:

Auto-discovers the skill's test harness (triggers.test.ts, unit.test.ts) at runtime
Builds an evaluator that scores candidates on content quality + trigger accuracy
Proposes improvements via LLM, keeping only versions that score higher
Feeds test failures as ASI (Actionable Side Information) so the LLM knows why a candidate failed

Results on GitHub Copilot for Azure skills

Tested on 4 skills with 0% invocation rates:

Skill	Quality Before	Quality After
azure-storage	0.16	1.00
entra-app-registration	0.38	1.00
microsoft-foundry	0.50	1.00
azure-deploy	0.62	1.00

Full baseline of all 23 skills: 0/23 pass quality threshold — every skill has room for improvement.

Key design decisions

Existing tests are NOT replaced — GEPA wraps them as its fitness function
Zero user config — evaluators auto-generated from test files at runtime
Opt-in — --gepa flag; default sensei behavior unchanged
Uses GitHub Models — free LLM via gh auth token, no API keys needed

Files changed

SKILL.md — Added GEPA mode invocation docs + Step 5-GEPA in Ralph loop
scripts/src/gepa/auto_evaluator.py — Auto-evaluator CLI (score, score-all, optimize)

Usage

# Score all skills (instant, no LLM)
python scripts/src/gepa/auto_evaluator.py score-all --skills-dir plugin/skills --tests-dir tests

# Optimize a skill
python scripts/src/gepa/auto_evaluator.py optimize --skill azure-storage --skills-dir plugin/skills --tests-dir tests

References

Add GEPA (Genetic-Pareto) evolutionary optimization as an optional enhancement to sensei's Ralph loop. When invoked with --gepa flag, replaces template-based improvement with LLM-driven optimization that uses existing test harness as its fitness function. Key additions: - scripts/src/gepa/auto_evaluator.py: Auto-discovers test files (triggers.test.ts, unit.test.ts) at runtime and builds GEPA evaluators dynamically. Zero manual configuration required. - SKILL.md: Added GEPA mode invocation docs and Step 5-GEPA instructions for the Ralph loop. The auto-evaluator provides three commands: - score: Evaluate SKILL.md quality (no LLM, instant) - score-all: Baseline all skills in a project - optimize: Run GEPA optimization using GitHub Models Existing tests are NOT replaced — GEPA wraps them as its fitness function and feeds test failures as Actionable Side Information to guide the LLM proposer. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fixes from consolidated review (Opus 4.6, GPT-5.4, Haiku): Code fixes (auto_evaluator.py): - Remove DO NOT USE FOR from scorer/optimizer (contradicts sensei scoring) - Parse frontmatter description for scoring (was only scoring body) - Fix regex to handle apostrophes and backtick template literals - Handle malformed YAML frontmatter gracefully (no more ValueError crash) - Change default paths from plugin/skills to skills/ - Return non-zero exit codes on errors (CI reliability) - Validate gh auth token output before using as API key - Align optimizer objective with scored sections - Fix docstring to accurately reflect fitness function scope - Seed GEPA with full SKILL.md content (frontmatter + body) Docs: - Add GEPA to README.md (Quick Start, Flags, Prerequisites, Commands) - Add GEPA to AGENTS.md (repo structure, testing, dependencies) - Fix SKILL.md command examples to use correct default paths - Add requirements.txt for Python dependencies Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

This was referenced Mar 25, 2026

feat: GEPA integration for sensei skill + quality score CI workflow spboyer/GitHub-Copilot-for-Azure#1

Closed

feat: GEPA integration for sensei skill + quality score CI workflow microsoft/GitHub-Copilot-for-Azure#1498

Merged

spboyer merged commit b814e3b into main Mar 25, 2026
2 checks passed

spboyer deleted the feat/gepa-integration branch March 25, 2026 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GEPA integration for automated skill optimization#7

feat: GEPA integration for automated skill optimization#7
spboyer merged 2 commits intomainfrom
feat/gepa-integration

spboyer commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

spboyer commented Mar 25, 2026

Summary

What GEPA does for sensei

Results on GitHub Copilot for Azure skills

Key design decisions

Files changed

Usage

References

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant