feat: GEPA integration for automated skill optimization#7
Merged
Conversation
Add GEPA (Genetic-Pareto) evolutionary optimization as an optional enhancement to sensei's Ralph loop. When invoked with --gepa flag, replaces template-based improvement with LLM-driven optimization that uses existing test harness as its fitness function. Key additions: - scripts/src/gepa/auto_evaluator.py: Auto-discovers test files (triggers.test.ts, unit.test.ts) at runtime and builds GEPA evaluators dynamically. Zero manual configuration required. - SKILL.md: Added GEPA mode invocation docs and Step 5-GEPA instructions for the Ralph loop. The auto-evaluator provides three commands: - score: Evaluate SKILL.md quality (no LLM, instant) - score-all: Baseline all skills in a project - optimize: Run GEPA optimization using GitHub Models Existing tests are NOT replaced — GEPA wraps them as its fitness function and feeds test failures as Actionable Side Information to guide the LLM proposer. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes from consolidated review (Opus 4.6, GPT-5.4, Haiku): Code fixes (auto_evaluator.py): - Remove DO NOT USE FOR from scorer/optimizer (contradicts sensei scoring) - Parse frontmatter description for scoring (was only scoring body) - Fix regex to handle apostrophes and backtick template literals - Handle malformed YAML frontmatter gracefully (no more ValueError crash) - Change default paths from plugin/skills to skills/ - Return non-zero exit codes on errors (CI reliability) - Validate gh auth token output before using as API key - Align optimizer objective with scored sections - Fix docstring to accurately reflect fitness function scope - Seed GEPA with full SKILL.md content (frontmatter + body) Docs: - Add GEPA to README.md (Quick Start, Flags, Prerequisites, Commands) - Add GEPA to AGENTS.md (repo structure, testing, dependencies) - Fix SKILL.md command examples to use correct default paths - Add requirements.txt for Python dependencies Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds GEPA (Genetic-Pareto) evolutionary optimization as an optional enhancement to sensei's Ralph loop.
What GEPA does for sensei
When invoked with
--gepa, sensei replaces its template-based improve step with LLM-driven evolutionary optimization. GEPA:triggers.test.ts,unit.test.ts) at runtimeResults on GitHub Copilot for Azure skills
Tested on 4 skills with 0% invocation rates:
Full baseline of all 23 skills: 0/23 pass quality threshold — every skill has room for improvement.
Key design decisions
--gepaflag; default sensei behavior unchangedgh auth token, no API keys neededFiles changed
SKILL.md— Added GEPA mode invocation docs + Step 5-GEPA in Ralph loopscripts/src/gepa/auto_evaluator.py— Auto-evaluator CLI (score, score-all, optimize)Usage
References