feat: GEPA integration for sensei skill + quality score CI workflow by spboyer · Pull Request #1498 · microsoft/GitHub-Copilot-for-Azure

spboyer · 2026-03-25T17:05:18Z

Summary

Adds GEPA (Genetic-Pareto) evolutionary optimization to the sensei skill, plus a CI workflow that scores SKILL.md quality on every PR.

What this PR adds

File	Purpose
`.github/skills/sensei/SKILL.md`	Added `--gepa` flag, GEPA mode docs, Step 5-GEPA in Ralph loop
`.github/skills/sensei/scripts/gepa/auto_evaluator.py`	Auto-discovers test harness, builds GEPA evaluators, scores/optimizes skills
`pipelines/gepa-quality-score.yml`	PR quality gate — scores all skills, uploads results as artifact
`pipelines/gepa-quality-score-comment.yml`	`workflow_run`-triggered commenter — posts score results as PR comment

How it works

Auto-discovers each skill's test harness (triggers.test.ts, unit.test.ts) at runtime
Builds an evaluator that scores on content quality + trigger accuracy
Proposes improvements via LLM (GitHub Models), keeping only better versions
Surfaces scoring feedback as ASI (Actionable Side Information) so the LLM knows why a candidate scored low — based on heuristic content checks and trigger accuracy (Jest test integration is planned for a future iteration)

Existing tests are NOT replaced or modified. GEPA wraps them as its fitness function.

Current baseline: 0/23 skills pass quality threshold

Skill                           Quality  Triggers  Tests
────────────────────────────────────────────────────────
✗ azure-compliance                 0.14      0.67    TIU
✗ azure-messaging                  0.14      1.00    TIU
✗ azure-ai                         0.16       N/A    TIU
✗ azure-storage                    0.16       N/A    -I-
✗ appinsights-instrumentation      0.24      1.00    TIU
✗ azure-aigateway                  0.25      0.92    TIU
✗ azure-rbac                       0.25      0.91    TIU
✗ azure-quotas                     0.34      0.96    TIU
✗ azure-kusto                      0.37       N/A    -I-
✗ azure-compute                    0.38       N/A    TIU
✗ azure-cost-optimization          0.38      0.89    TIU
✗ azure-resource-lookup            0.38      0.54    TIU
✗ azure-resource-visualizer        0.38      0.92    TIU
✗ entra-app-registration           0.38      1.00    TIU
✗ azure-hosted-copilot-sdk         0.39      0.94    TIU
✗ azure-cloud-migrate              0.45      1.00    TIU
✗ azure-diagnostics                0.50      0.90    TIU
✗ azure-prepare                    0.50      1.00    TIU
✗ microsoft-foundry                0.50      0.91    TIU
✗ azure-deploy                     0.62      1.00    TIU
✗ azure-upgrade                    0.62      1.00    TIU
✗ azure-validate                   0.62      1.00    TIU

  0/23 skills at quality >= 0.80

GEPA optimization results (sample run on 4 skills)

Skill	Quality Before	Quality After	What GEPA added
azure-storage	0.16	1.00	Triggers, Rules, Steps, USE FOR, WHEN, DO NOT USE FOR — all missing
entra-app-registration	0.38	1.00	Triggers, Rules, USE FOR, WHEN, DO NOT USE FOR
microsoft-foundry	0.50	1.00	Triggers, Rules, WHEN, DO NOT USE FOR
azure-deploy	0.62	1.00	USE FOR, WHEN, DO NOT USE FOR

Before / After example: azure-storage

BEFORE (quality: 0.16) — flat reference doc, no agent routing signals

# Azure Storage Services

## Services

| Service | Use When | MCP Tools | CLI |
|---------|----------|-----------|-----|
| Blob Storage | Objects, files, backups, static content | azure__storage | az storage blob |
| File Shares | SMB file shares, lift-and-shift | - | az storage file |

## MCP Server (Preferred)
## CLI Fallback
## Storage Account Tiers

Missing: ## Triggers, ## Rules, ## Steps, USE FOR, WHEN, DO NOT USE FOR

AFTER (quality: 1.00) — structured agent instructions with routing

# Azure Storage Services Skill

Azure Storage Services skill facilitates efficient data management,
storage, and access operations in Azure environments.

## Triggers
- WHEN a user inquires about storing, managing, or retrieving data in Azure Storage.
- WHEN a user mentions blob storage, file shares, queue storage, table storage, data lake.

## Rules
- USE THIS SKILL WHEN:
  - Query involves Azure Storage services (blobs, queues, tables, files, data lakes)
- DO NOT USE FOR:
  - General Azure resource provisioning (use azure-prepare)
  - Databases (SQL, Cosmos DB, MySQL)

## Steps
1. Analyze the user query to identify Azure Storage service and intent
2. Verify the required functionality (create, manage, retrieve)
3. Use Azure MCP commands if enabled
4. Fallback to Azure CLI if MCP unavailable
5. Provide best practices (tiers, redundancy, SDK usage)
6. Link relevant documentation

Usage

# Score all skills (instant, no LLM calls)
python .github/skills/sensei/scripts/gepa/auto_evaluator.py score-all \
  --skills-dir plugin/skills --tests-dir tests

# Optimize a specific skill
python .github/skills/sensei/scripts/gepa/auto_evaluator.py optimize \
  --skill azure-storage --skills-dir plugin/skills --tests-dir tests

# Via sensei
Run sensei on azure-storage --gepa

References

GEPA optimize_anything
gskill: learning skills for coding agents
Related PR: spboyer/sensei#7

Copilot

Pull request overview

Adds GEPA-based evaluation/optimization tooling for skills and introduces a PR-time workflow that scores SKILL.md quality and reports results.

Changes:

Added a GitHub Actions workflow to score skills on PRs / manual runs and comment results.
Introduced a Python auto-evaluator that discovers existing TS test harnesses and computes quality/trigger scores (and can run GEPA optimization).
Updated sensei skill documentation to describe --gepa mode and a GEPA step in the Ralph loop.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File	Description
pipelines/gepa-quality-score.yml	New CI workflow to compute GEPA quality scores, upload JSON results, and comment on PRs.
.github/skills/sensei/scripts/gepa/auto_evaluator.py	New GEPA auto-evaluator CLI: harness discovery, scoring, and optimization entrypoint.
.github/skills/sensei/SKILL.md	Docs: adds `--gepa` usage and describes the GEPA optimization step.

pipelines/gepa-quality-score.yml

.github/skills/sensei/scripts/gepa/auto_evaluator.py

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (5)

.github/skills/sensei/scripts/gepa/auto_evaluator.py:347

trigger_prompt_count currently counts only should_trigger prompts and ignores should_not_trigger. That makes the field name misleading and can confuse downstream consumers of the JSON output. Either include both arrays in the count or rename the field to reflect what it measures.

        "has_integration_test": harness["has_integration"],
        "has_unit_test": harness["has_unit"],
        "trigger_prompt_count": len(harness["trigger_prompts"]["should_trigger"]),
    }

.github/skills/sensei/scripts/gepa/auto_evaluator.py:247

There are unused parameters that add noise and make the CLI harder to maintain: score_skill(..., as_json=...) never uses as_json, and build_evaluator(..., fast=...) never uses fast. Remove these parameters or wire them up (e.g., use fast to disable slower checks) to avoid misleading callers.

def build_evaluator(skill_name: str, tests_dir: Path, fast: bool = True):
    """Auto-build a GEPA evaluator for a skill from its test harness.

pipelines/gepa-quality-score.yml:7

The workflow trigger paths: doesn’t include the GEPA scoring implementation under .github/skills/sensei/scripts/gepa/**. Changes to the evaluator logic won’t re-run this quality-score workflow, which can lead to PRs merging with an unvalidated scoring change. Consider adding that path (and any other inputs like requirements files) to the trigger list.

  pull_request:
    paths:
      - 'plugin/skills/**/SKILL.md'
      - 'tests/**'

pipelines/gepa-quality-score.yml:30

Most workflows in this repo use actions/checkout@v6 (e.g., .github/workflows/pr.yml). This new workflow uses actions/checkout@v4, which is inconsistent and may behave differently than expected in this repo’s CI environment. Align the checkout action version with the rest of the workflows.

      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:

pipelines/gepa-quality-score.yml:35

pip install gepa installs the latest GEPA release, which makes CI runs non-reproducible and can cause unexpected breakages when GEPA publishes changes. Pin the dependency to a known-good version (or install from a locked requirements file) so the scoring behavior is stable across PRs.

      - name: Install GEPA evaluator deps
        run: pip install gepa

.github/skills/sensei/scripts/gepa/auto_evaluator.py

.github/skills/sensei/SKILL.md

JasonYeMSFT · 2026-03-25T18:26:10Z

In the before/after example of azure-storage, why do we need to describe when to use the skill after the skill description? They agent only has presents the skill's description to the LLM when deciding whether to load the skill.

spboyer · 2026-03-25T20:00:36Z

Good point @JasonYeMSFT — the before/after example in the PR description shows what GEPA generates for the SKILL.md body, not the frontmatter description. You're right that routing decisions only use the frontmatter description field. The body ## Triggers / WHEN: sections are read after the skill is loaded to guide the LLM during execution, but they're redundant for routing.

I'll tune the GEPA optimization objective to focus body content on execution instructions (Steps, Rules, MCP Tools) rather than duplicating routing signals that belong in the frontmatter description. The AFTER example will be updated once that's done.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (4)

pipelines/gepa-quality-score.yml:103

This workflow posts PR comments from the PR’s workflow run (pull-requests: write). Elsewhere in this repo, commenting is intentionally done via a separate workflow triggered by workflow_run so the commenting code always executes from main (see note in .github/workflows/pr.yml). To match that security model, consider emitting an artifact here and moving the PR-commenting step into the existing pr-comment.yml (or a new workflow_run-based commenter) instead of running github-script directly in the PR context.

      - name: Add PR comment with scores
        if: github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const results = JSON.parse(fs.readFileSync('score-results.json', 'utf8'));

pipelines/gepa-quality-score.yml:32

This workflow uses floating action tags (e.g., actions/checkout@v4, actions/setup-python@v5). In this repo’s existing GitHub Actions workflows, actions are pinned to full commit SHAs for supply-chain integrity (see .github/workflows/pr.yml). When moving this workflow under .github/workflows/, please pin each uses: to a commit SHA (and keep the version comment).

      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

.github/skills/sensei/scripts/gepa/auto_evaluator.py:49

The header says this keyword matching “mirrors trigger-matcher.ts”, but the implementation diverges (stop-word filtering, stemming, per-word matching, extra keywords). Since trigger accuracy in this evaluator is compared against prompts coming from triggers.test.ts (which uses tests/utils/trigger-matcher.ts), this mismatch can produce misleading trigger_accuracy/fitness. Either reimplement the exact TriggerMatcher logic here (substring includes + same keyword set) or rename/reword to clarify it’s only an approximation and not comparable to the Jest trigger tests.

# ── Keyword matching (mirrors trigger-matcher.ts) ──────────────────────────

AZURE_KEYWORDS = [
    "azure", "storage", "cosmos", "sql", "redis", "keyvault", "key vault",
    "function", "app service", "container", "aks", "kubernetes", "bicep",
    "terraform", "deploy", "monitor", "diagnostic", "security", "rbac",
    "identity", "entra", "authentication", "cli", "mcp", "validation",
    "networking", "observability", "foundry", "agent", "model",
]

STOP_WORDS = {
    "the", "and", "for", "with", "this", "that", "from", "have", "has",
    "are", "was", "were", "been", "being", "will", "would", "could",
    "should", "may", "might", "can", "shall", "not", "use", "when",
    "what", "how", "why", "who", "which", "where", "does", "don",
    "your", "its", "our", "their", "these", "those", "some", "any",
    "all", "each", "every", "both", "such", "than", "also", "only",
}

.github/skills/sensei/scripts/gepa/auto_evaluator.py:255

build_evaluator(..., fast: bool = True) takes a fast flag but it’s never used. This makes it unclear whether “fast vs full” evaluation is supported. Either remove the parameter or implement the intended behavior (e.g., toggling whether to run Jest tests vs. heuristic-only scoring).

def build_evaluator(skill_name: str, tests_dir: Path, fast: bool = True):
    """Auto-build a GEPA evaluator for a skill from its test harness.

    Returns a callable(candidate, example) -> (score, asi_dict).
    """
    harness = discover_test_harness(tests_dir, skill_name)

.github/skills/sensei/scripts/gepa/auto_evaluator.py

.github/skills/sensei/SKILL.md

Feedback from microsoft/GitHub-Copilot-for-Azure#1498: - Focus body on execution (Rules, Steps, MCP Tools) not routing signals — routing belongs in frontmatter description (JasonYeMSFT) - Strip comments before extracting trigger arrays to avoid commented-out prompts polluting test data (Copilot reviewer) - Clarify --gepa replaces Step 5 only, not Steps 5-6 (Copilot reviewer) - Remove unused imports: dataclass, field (CodeQL) - Remove unused params: as_json in score_skill, fast in build_evaluator - Fix trigger_prompt_count to include both should/should_not arrays - Update optimizer objective to distinguish routing vs execution content Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

.github/skills/sensei/scripts/gepa/auto_evaluator.py

pipelines/gepa-quality-score.yml

.github/skills/sensei/scripts/gepa/auto_evaluator.py

.github/skills/sensei/SKILL.md

pipelines/gepa-quality-score.yml

Add GEPA (Genetic-Pareto) evolutionary optimization as an optional enhancement to sensei's Ralph loop for automated SKILL.md improvement. Changes: - .github/skills/sensei/SKILL.md: Added --gepa flag, GEPA mode docs, Step 5-GEPA in the Ralph loop - .github/skills/sensei/scripts/gepa/auto_evaluator.py: Auto-discovers test harness at runtime, builds GEPA evaluators, scores/optimizes skills - pipelines/gepa-quality-score.yml: PR quality gate that scores SKILL.md quality and posts results as PR comment The auto-evaluator requires zero manual configuration. It reads triggers.test.ts to extract shouldTrigger/shouldNotTrigger arrays and builds a composite evaluator (content quality + trigger accuracy). Existing tests are NOT replaced or modified. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Bump sensei SKILL.md version 1.0.0 → 1.0.2 (fixes Skill Structure CI) - Remove unused imports: sys, dataclass, field (fixes CodeQL warnings) - Extract strip_frontmatter() helper to replace fragile content.index() parsing that could raise ValueError on malformed frontmatter - Deduplicate frontmatter stripping logic between score_skill/optimize_skill - Add explicit permissions block (contents: read, pull-requests: write) - Use sticky comment pattern (<- Consolidate FileSystemWatcher usage: gepa-quality-score --> marker) to avoid PR comment spam on re-runs - Fix display results to match workflow_dispatch single-skill input - Rename quality gate step to '(advisory)' to clarify non-blocking behavior Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Forked PRs have reduced GITHUB_TOKEN permissions, which would cause the comment step to fail. Only post comments when the PR originates from the same repository. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Strip single-line (//) and multi-line (/* */) comments from trigger test arrays before extracting strings, preventing commented-out example prompts from polluting trigger accuracy scoring - Fix SKILL.md step 5b to clarify GEPA only replaces step 5 (IMPROVE FRONTMATTER), not step 6 (IMPROVE TESTS) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The evaluator parses trigger prompt arrays and uses content heuristics for scoring — it does not execute Jest tests or incorporate test pass/fail results. Updated docs to accurately describe this. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Remove unused params: as_json from score_skill, fast from build_evaluator - Pin all actions to commit SHAs matching repo convention (checkout v6, setup-python v6.2.0, upload-artifact v7.0.0, github-script v8.0.0) - Pin gepa dependency to v0.7.0 for reproducible CI - Remove DO NOT USE FOR from scoring criteria (conflicts with repo guidance that discourages it due to keyword contamination risk) - Add quality_score_raw field for full-precision threshold comparisons - Enhance parse_trigger_arrays to resolve ...varName spread patterns by extracting strings from referenced arrays in the same file - Clarify SKILL.md step 5b: GEPA uses trigger definitions as config, does not execute Jest tests - Add NOTE about future workflow_run commenting pattern migration Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (4)

pipelines/gepa-quality-score.yml:79

The quality-gate script compares against r.get('quality_score', 0), but auto_evaluator.py rounds quality_score to 2 decimals and also emits an unrounded quality_score_raw. Using the rounded value can incorrectly pass/fail near the threshold (e.g., 0.799 rounds to 0.80). Prefer comparing quality_score_raw (or avoid rounding in the JSON written to score-results.json).

          MIN_SCORE="${{ github.event.inputs.min_score || '0.5' }}"
          python -c "
          import json, sys
          with open('score-results.json') as f:
              results = json.load(f)
          if not isinstance(results, list):
              results = [results]

          failed = []
          for r in results:
              if 'error' in r:
                  continue
              if r.get('quality_score', 0) < float('${MIN_SCORE}'):
                  failed.append(f\"{r['skill']}: {r['quality_score']:.2f} (need >= ${MIN_SCORE})\")

pipelines/gepa-quality-score.yml:89

MIN_SCORE is interpolated directly into an inline python -c string. For workflow_dispatch, min_score is user-provided input, so this pattern can lead to accidental quoting issues or code injection if the input contains unexpected characters. Safer approach: pass MIN_SCORE via environment (or argv) and parse it inside Python without string interpolation.

          MIN_SCORE="${{ github.event.inputs.min_score || '0.5' }}"
          python -c "
          import json, sys
          with open('score-results.json') as f:
              results = json.load(f)
          if not isinstance(results, list):
              results = [results]

          failed = []
          for r in results:
              if 'error' in r:
                  continue
              if r.get('quality_score', 0) < float('${MIN_SCORE}'):
                  failed.append(f\"{r['skill']}: {r['quality_score']:.2f} (need >= ${MIN_SCORE})\")

          if failed:
              print('⚠️ Skills below quality threshold (advisory — not blocking):')
              for f in failed:
                  print(f'  {f}')
              print()
              print('💡 Run: python .github/skills/sensei/scripts/gepa/auto_evaluator.py optimize --skill <name> --skills-dir plugin/skills --tests-dir tests')
          else:
              print('✅ All skills meet quality threshold')
          "

pipelines/gepa-quality-score.yml:103

This workflow posts PR comments from the pull_request workflow itself. The repo already uses a workflow_run-based commenter (.github/workflows/pr-comment.yml) specifically so commenting code always runs from main and can’t be modified by a PR. To match that security posture, consider uploading score-results.json (and/or a rendered markdown report) as an artifact here and adding a separate workflow_run commenter on main to download and post/update the sticky comment.

      # NOTE: Ideally PR commenting should use a workflow_run-based pattern
      # (score workflow uploads artifact, separate commenter on main downloads
      # and posts) for better security. See .github/workflows/pr-comment.yml.
      - name: Add PR comment with scores
        if: github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository
        uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
        with:

pipelines/gepa-quality-score.yml:34

This workflow only runs score/score-all, and those code paths don’t import gepa (the gepa.optimize_anything import is only reached in optimize). Installing gepa==0.7.0 here adds time and an external dependency for a job that, as written, can score without it. Either drop this install for the scoring workflow, or make the scorer depend on GEPA explicitly so the dependency is justified.

      - name: Install GEPA evaluator deps
        run: pip install "gepa==0.7.0"

.github/skills/sensei/scripts/gepa/auto_evaluator.py

- Split gepa-quality-score.yml into read-only scoring workflow + workflow_run-triggered commenter (gepa-quality-score-comment.yml), matching the repo's existing pr.yml / pr-comment.yml pattern - Fix API key regex to also match 'api key:' with whitespace separator - Update PR description to clarify ASI uses heuristic scoring (Jest integration is planned for future iteration) - Remove pull-requests:write from scoring workflow permissions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer · 2026-04-01T21:01:05Z

@-

…icrosoft#1498) * feat: add GEPA integration to sensei skill + quality score workflow Add GEPA (Genetic-Pareto) evolutionary optimization as an optional enhancement to sensei's Ralph loop for automated SKILL.md improvement. Changes: - .github/skills/sensei/SKILL.md: Added --gepa flag, GEPA mode docs, Step 5-GEPA in the Ralph loop - .github/skills/sensei/scripts/gepa/auto_evaluator.py: Auto-discovers test harness at runtime, builds GEPA evaluators, scores/optimizes skills - pipelines/gepa-quality-score.yml: PR quality gate that scores SKILL.md quality and posts results as PR comment The auto-evaluator requires zero manual configuration. It reads triggers.test.ts to extract shouldTrigger/shouldNotTrigger arrays and builds a composite evaluator (content quality + trigger accuracy). Existing tests are NOT replaced or modified. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address PR review feedback for GEPA integration - Bump sensei SKILL.md version 1.0.0 → 1.0.2 (fixes Skill Structure CI) - Remove unused imports: sys, dataclass, field (fixes CodeQL warnings) - Extract strip_frontmatter() helper to replace fragile content.index() parsing that could raise ValueError on malformed frontmatter - Deduplicate frontmatter stripping logic between score_skill/optimize_skill - Add explicit permissions block (contents: read, pull-requests: write) - Use sticky comment pattern (<- Consolidate FileSystemWatcher usage: gepa-quality-score --> marker) to avoid PR comment spam on re-runs - Fix display results to match workflow_dispatch single-skill input - Rename quality gate step to '(advisory)' to clarify non-blocking behavior Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: skip PR comment step for forked PRs Forked PRs have reduced GITHUB_TOKEN permissions, which would cause the comment step to fail. Only post comments when the PR originates from the same repository. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: strip comments in trigger parsing + clarify GEPA step scope - Strip single-line (//) and multi-line (/* */) comments from trigger test arrays before extracting strings, preventing commented-out example prompts from polluting trigger accuracy scoring - Fix SKILL.md step 5b to clarify GEPA only replaces step 5 (IMPROVE FRONTMATTER), not step 6 (IMPROVE TESTS) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: correct docstring and SKILL.md to reflect actual evaluator behavior The evaluator parses trigger prompt arrays and uses content heuristics for scoring — it does not execute Jest tests or incorporate test pass/fail results. Updated docs to accurately describe this. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address round 3 review feedback - Remove unused params: as_json from score_skill, fast from build_evaluator - Pin all actions to commit SHAs matching repo convention (checkout v6, setup-python v6.2.0, upload-artifact v7.0.0, github-script v8.0.0) - Pin gepa dependency to v0.7.0 for reproducible CI - Remove DO NOT USE FOR from scoring criteria (conflicts with repo guidance that discourages it due to keyword contamination risk) - Add quality_score_raw field for full-precision threshold comparisons - Enhance parse_trigger_arrays to resolve ...varName spread patterns by extracting strings from referenced arrays in the same file - Clarify SKILL.md step 5b: GEPA uses trigger definitions as config, does not execute Jest tests - Add NOTE about future workflow_run commenting pattern migration Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address PR review feedback — split workflow, fix regex, update docs - Split gepa-quality-score.yml into read-only scoring workflow + workflow_run-triggered commenter (gepa-quality-score-comment.yml), matching the repo's existing pr.yml / pr-comment.yml pattern - Fix API key regex to also match 'api key:' with whitespace separator - Update PR description to clarify ASI uses heuristic scoring (Jest integration is planned for future iteration) - Remove pull-requests:write from scoring workflow permissions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 25, 2026 17:05

Copilot AI reviewed Mar 25, 2026

View reviewed changes

github-advanced-security bot found potential problems Mar 25, 2026

View reviewed changes

.github/skills/sensei/scripts/gepa/auto_evaluator.py Fixed Show fixed Hide fixed

.github/skills/sensei/scripts/gepa/auto_evaluator.py Fixed Show fixed Hide fixed

Copilot started reviewing on behalf of spboyer March 25, 2026 17:30 View session

Copilot AI review requested due to automatic review settings March 25, 2026 17:53

Copilot started reviewing on behalf of spboyer March 25, 2026 17:58 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

.github/skills/sensei/scripts/gepa/auto_evaluator.py Show resolved Hide resolved

.github/skills/sensei/SKILL.md Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings March 25, 2026 20:46

spboyer force-pushed the feat/gepa-sensei-integration branch from aa042e7 to 0cdeb3a Compare March 25, 2026 20:46

Copilot started reviewing on behalf of spboyer March 25, 2026 20:47 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

.github/skills/sensei/scripts/gepa/auto_evaluator.py Outdated Show resolved Hide resolved

.github/skills/sensei/SKILL.md Show resolved Hide resolved

spboyer force-pushed the feat/gepa-sensei-integration branch from 133105d to 0b3945a Compare March 26, 2026 04:23

Copilot AI review requested due to automatic review settings March 26, 2026 04:23

Copilot started reviewing on behalf of spboyer March 26, 2026 04:24 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

spboyer and others added 6 commits March 26, 2026 10:48

fix: skip PR comment step for forked PRs

25865ee

Forked PRs have reduced GITHUB_TOKEN permissions, which would cause the comment step to fail. Only post comments when the PR originates from the same repository. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 26, 2026 17:48

spboyer force-pushed the feat/gepa-sensei-integration branch from 25b60cd to 8061237 Compare March 26, 2026 17:48

Copilot AI reviewed Mar 26, 2026

View reviewed changes

.github/skills/sensei/scripts/gepa/auto_evaluator.py Show resolved Hide resolved

.github/skills/sensei/scripts/gepa/auto_evaluator.py Outdated Show resolved Hide resolved

github-actions bot mentioned this pull request Mar 27, 2026

[repo-status] Weekly Repo Status — Mar 20 – Mar 27, 2026 #1547

Closed

6 tasks

kvenkatrajan approved these changes Apr 2, 2026

View reviewed changes

kvenkatrajan merged commit dc149f6 into microsoft:main Apr 2, 2026
9 checks passed

Conversation

spboyer commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this PR adds

How it works

Current baseline: 0/23 skills pass quality threshold

GEPA optimization results (sample run on 4 skills)

Before / After example: azure-storage

Usage

References

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

JasonYeMSFT commented Mar 25, 2026

Uh oh!

spboyer commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

spboyer commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

spboyer commented Mar 25, 2026 •

edited

Loading