|
| 1 | +--- |
| 2 | +name: agentic-engineering |
| 3 | +description: > |
| 4 | + Operate as an agentic engineer using eval-first execution, decomposition, |
| 5 | + and cost-aware model routing. Use when AI agents perform most implementation |
| 6 | + work and humans enforce quality and risk controls. |
| 7 | +metadata: |
| 8 | + origin: ECC |
| 9 | +--- |
| 10 | + |
| 11 | +# Agentic Engineering |
| 12 | + |
| 13 | +Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls. |
| 14 | + |
| 15 | +## Operating Principles |
| 16 | + |
| 17 | +1. Define completion criteria before execution. |
| 18 | +2. Decompose work into agent-sized units. |
| 19 | +3. Route model tiers by task complexity. |
| 20 | +4. Measure with evals and regression checks. |
| 21 | + |
| 22 | +## Eval-First Loop |
| 23 | + |
| 24 | +1. Define capability eval and regression eval. |
| 25 | +2. Run baseline and capture failure signatures. |
| 26 | +3. Execute implementation. |
| 27 | +4. Re-run evals and compare deltas. |
| 28 | + |
| 29 | +**Example workflow:** |
| 30 | +``` |
| 31 | +1. Write test that captures desired behavior (eval) |
| 32 | +2. Run test → capture baseline failures |
| 33 | +3. Implement feature |
| 34 | +4. Re-run test → verify improvements |
| 35 | +5. Check for regressions in other tests |
| 36 | +``` |
| 37 | + |
| 38 | +## Task Decomposition |
| 39 | + |
| 40 | +Apply the 15-minute unit rule: |
| 41 | +- Each unit should be independently verifiable |
| 42 | +- Each unit should have a single dominant risk |
| 43 | +- Each unit should expose a clear done condition |
| 44 | + |
| 45 | +**Good decomposition:** |
| 46 | +``` |
| 47 | +Task: Add user authentication |
| 48 | +├─ Unit 1: Add password hashing (15 min, security risk) |
| 49 | +├─ Unit 2: Create login endpoint (15 min, API contract risk) |
| 50 | +├─ Unit 3: Add session management (15 min, state risk) |
| 51 | +└─ Unit 4: Protect routes with middleware (15 min, auth logic risk) |
| 52 | +``` |
| 53 | + |
| 54 | +**Bad decomposition:** |
| 55 | +``` |
| 56 | +Task: Add user authentication (2 hours, multiple risks) |
| 57 | +``` |
| 58 | + |
| 59 | +## Model Routing |
| 60 | + |
| 61 | +Choose model tier based on task complexity: |
| 62 | + |
| 63 | +- **Haiku**: Classification, boilerplate transforms, narrow edits |
| 64 | + - Example: Rename variable, add type annotation, format code |
| 65 | + |
| 66 | +- **Sonnet**: Implementation and refactors |
| 67 | + - Example: Implement feature, refactor module, write tests |
| 68 | + |
| 69 | +- **Opus**: Architecture, root-cause analysis, multi-file invariants |
| 70 | + - Example: Design system, debug complex issue, review architecture |
| 71 | + |
| 72 | +**Cost discipline:** Escalate model tier only when lower tier fails with a clear reasoning gap. |
| 73 | + |
| 74 | +## Session Strategy |
| 75 | + |
| 76 | +- **Continue session** for closely-coupled units |
| 77 | + - Example: Implementing related functions in same module |
| 78 | + |
| 79 | +- **Start fresh session** after major phase transitions |
| 80 | + - Example: Moving from implementation to testing |
| 81 | + |
| 82 | +- **Compact after milestone completion**, not during active debugging |
| 83 | + - Example: After feature complete, before starting next feature |
| 84 | + |
| 85 | +## Review Focus for AI-Generated Code |
| 86 | + |
| 87 | +Prioritize: |
| 88 | +- Invariants and edge cases |
| 89 | +- Error boundaries |
| 90 | +- Security and auth assumptions |
| 91 | +- Hidden coupling and rollout risk |
| 92 | + |
| 93 | +Do not waste review cycles on style-only disagreements when automated format/lint already enforce style. |
| 94 | + |
| 95 | +**Review checklist:** |
| 96 | +- [ ] Edge cases handled (null, empty, boundary values) |
| 97 | +- [ ] Error handling comprehensive |
| 98 | +- [ ] Security assumptions validated |
| 99 | +- [ ] No hidden coupling between modules |
| 100 | +- [ ] Rollout risk assessed (breaking changes, migrations) |
| 101 | + |
| 102 | +## Cost Discipline |
| 103 | + |
| 104 | +Track per task: |
| 105 | +- Model tier used |
| 106 | +- Token estimate |
| 107 | +- Retries needed |
| 108 | +- Wall-clock time |
| 109 | +- Success/failure outcome |
| 110 | + |
| 111 | +**Example tracking:** |
| 112 | +``` |
| 113 | +Task: Implement user login |
| 114 | +Model: Sonnet |
| 115 | +Tokens: ~5k input, ~2k output |
| 116 | +Retries: 1 (initial implementation had auth bug) |
| 117 | +Time: 8 minutes |
| 118 | +Outcome: Success |
| 119 | +``` |
| 120 | + |
| 121 | +## When to Use This Skill |
| 122 | + |
| 123 | +- Managing AI-driven development workflows |
| 124 | +- Planning agent task decomposition |
| 125 | +- Optimizing model tier selection |
| 126 | +- Implementing eval-first development |
| 127 | +- Reviewing AI-generated code |
| 128 | +- Tracking development costs |
| 129 | + |
| 130 | +## Integration with Other Skills |
| 131 | + |
| 132 | +- **tdd-workflow**: Combine with eval-first loop for test-driven development |
| 133 | +- **verification-loop**: Use for continuous validation during implementation |
| 134 | +- **search-first**: Apply before implementation to find existing solutions |
| 135 | +- **coding-standards**: Reference during code review phase |
0 commit comments