Skip to content

Add non-blocking CI drift check for instruction file references #3616

@dgenio

Description

@dgenio

Why

AI instruction files (AGENTS.md, copilot-instructions.md, scoped .instructions.md files) reference specific paths, commands, and directory structures. When the codebase evolves, these references can become stale — leading agents to use wrong commands or look in wrong directories. A lightweight, non-blocking CI check catches this drift early, surfacing warnings in PR reviews without blocking merges.

Scope / Proposed changes

  • New file: .github/workflows/instruction-drift.yml (~60 lines)

Proposed contents

name: Instruction Drift Check

on:
  pull_request:
    branches: ['main']
    paths:
      - 'AGENTS.md'
      - 'CLAUDE.md'
      - '.github/copilot-instructions.md'
      - '.github/instructions/**'
      - 'lm_eval/**'
      - 'tests/**'
      - 'docs/**'
      - 'pyproject.toml'
      - '.pre-commit-config.yaml'
  workflow_dispatch:

jobs:
  check-drift:
    name: Check instruction file references
    runs-on: ubuntu-latest
    timeout-minutes: 5
    continue-on-error: true  # Non-blocking: warns but doesn't fail the PR

    steps:
      - name: Checkout Code
        uses: actions/checkout@v6

      - name: Validate referenced paths exist
        run: |
          EXIT_CODE=0
          INSTRUCTION_FILES=(
            "AGENTS.md"
            "CLAUDE.md"
            ".github/copilot-instructions.md"
          )

          # Also check any .instructions.md files
          while IFS= read -r -d '' file; do
            INSTRUCTION_FILES+=("$file")
          done < <(find .github/instructions -name '*.instructions.md' -print0 2>/dev/null || true)

          for file in "${INSTRUCTION_FILES[@]}"; do
            if [ ! -f "$file" ]; then
              continue
            fi

            echo "::group::Checking $file"

            # Extract paths that look like references to repo files/dirs
            # Matches patterns like: lm_eval/api/, docs/new_task_guide.md, tests/models/
            grep -oP '(?:^|\s|`|"|\(|/)(?:lm_eval|tests|docs|scripts|\.github|\.pre-commit)[/\w.-]+' "$file" | \
              sort -u | while read -r path; do
                # Strip leading/trailing whitespace and backticks
                path=$(echo "$path" | sed 's/^[\s`"(\/]*//' | sed 's/[\s`")]*$//')
                # Skip if empty or looks like a URL
                [ -z "$path" ] && continue
                echo "$path" | grep -q 'http' && continue
                # Check if path exists (as file or directory)
                if [ ! -e "$path" ] && [ ! -e "${path%/}" ]; then
                  echo "::warning file=$file::Referenced path '$path' does not exist in the repo"
                  EXIT_CODE=1
                fi
              done

            echo "::endgroup::"
          done

          if [ $EXIT_CODE -ne 0 ]; then
            echo ""
            echo "::notice::Some instruction files reference paths that don't exist. Please update the references."
          fi
          exit $EXIT_CODE

Labels to apply

  • Base: agent-readiness
  • Priority: priority:low
  • Area: tooling

Depends on

All instruction files should exist before adding a CI check that validates them.

Related existing issues

None — no existing issues cover instruction file validation or drift checking.

Acceptance criteria

  • .github/workflows/instruction-drift.yml exists
  • Workflow triggers on PRs to main when instruction files or key code paths change
  • continue-on-error: true is set (non-blocking)
  • Workflow correctly identifies paths referenced in instruction files
  • Stale references produce GitHub Actions ::warning annotations (visible in PR review)
  • Workflow does NOT block PR merging
  • Workflow passes when all references are valid

Avoid drift/duplication notes

  • This workflow is intentionally non-blocking (continue-on-error: true). It produces warnings, not failures.
  • The path extraction regex is conservative — it only looks for repo-relative paths starting with known directories (lm_eval/, tests/, docs/, etc.).
  • If new instruction files are added in the future, add them to the INSTRUCTION_FILES array or let the find command pick them up automatically.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions