Skip to content

Commit af7c55c

Browse files
authored
t1128: update model registry with current Anthropic model IDs (#1712)
* fix: update model registry to current Anthropic model IDs (t1128) - claude-opus-4 -> claude-opus-4-6 ($15/$75 -> $5/$25 per MTok) - claude-sonnet-4 -> claude-sonnet-4-6 (pricing unchanged) - claude-haiku-3.5 -> claude-haiku-4-5 ($0.80/$4 -> $1/$5 per MTok) - Update model-registry.db: models + subagent_models tables - Fix opus.md: model field was claude-opus-4-20250514, now claude-opus-4-6 - Fix haiku.md: model field was claude-3-5-haiku-20241022, now claude-haiku-4-5-20251001 - Regenerate MODELS.md from updated registry - Source: https://docs.anthropic.com/en/docs/about-claude/models/overview * fix: correct stale relative cost ratios in routing tiers table (t1128) - haiku: ~0.25x -> ~0.33x (claude-haiku-4-5 $1/$5 vs sonnet $3/$15) - opus: ~3x -> ~1.7x (claude-opus-4-6 $5/$25 vs sonnet $3/$15) - Regenerate MODELS.md with corrected values - Addresses Gemini code review feedback on PR #1712 * chore: mark t1128 complete (pr:#1712)
1 parent 022609a commit af7c55c

6 files changed

Lines changed: 54 additions & 52 deletions

File tree

.agents/scripts/compare-models-helper.sh

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -143,12 +143,12 @@ get_all_tier_patterns() {
143143
# Model Database (embedded reference data)
144144
# =============================================================================
145145
# Format: model_id|provider|display_name|context_window|input_price_per_1m|output_price_per_1m|tier|capabilities|best_for
146-
# Prices in USD per 1M tokens. Last updated: 2025-02-08.
146+
# Prices in USD per 1M tokens. Last updated: 2026-02-18.
147147
# Sources: Anthropic, OpenAI, Google official pricing pages.
148148

149-
readonly MODEL_DATA="claude-opus-4|Anthropic|Claude Opus 4|200000|15.00|75.00|high|code,reasoning,architecture,vision,tools|Architecture decisions, novel problems, complex multi-step reasoning
150-
claude-sonnet-4|Anthropic|Claude Sonnet 4|200000|3.00|15.00|medium|code,reasoning,vision,tools|Code implementation, review, most development tasks
151-
claude-haiku-3.5|Anthropic|Claude 3.5 Haiku|200000|0.80|4.00|low|code,reasoning,vision,tools|Triage, classification, simple transforms, formatting
149+
readonly MODEL_DATA="claude-opus-4-6|Anthropic|Claude Opus 4.6|200000|5.00|25.00|high|code,reasoning,architecture,vision,tools|Architecture decisions, novel problems, complex multi-step reasoning
150+
claude-sonnet-4-6|Anthropic|Claude Sonnet 4.6|200000|3.00|15.00|medium|code,reasoning,vision,tools|Code implementation, review, most development tasks
151+
claude-haiku-4-5|Anthropic|Claude Haiku 4.5|200000|1.00|5.00|low|code,reasoning,vision,tools|Triage, classification, simple transforms, formatting
152152
gpt-4.1|OpenAI|GPT-4.1|1048576|2.00|8.00|medium|code,reasoning,vision,tools,search|Coding, instruction following, long context
153153
gpt-4.1-mini|OpenAI|GPT-4.1 Mini|1048576|0.40|1.60|low|code,reasoning,vision,tools|Cost-efficient coding and general tasks
154154
gpt-4.1-nano|OpenAI|GPT-4.1 Nano|1048576|0.10|0.40|low|code,reasoning,tools|Fast classification, simple transforms
@@ -169,31 +169,31 @@ llama-4-scout|Meta|Llama 4 Scout|512000|0.15|0.40|low|code,reasoning,vision,tool
169169
# =============================================================================
170170
# Maps aidevops internal tiers to recommended models
171171

172-
readonly TIER_MAP="haiku|claude-haiku-3.5|Triage, classification, simple transforms
172+
readonly TIER_MAP="haiku|claude-haiku-4-5|Triage, classification, simple transforms
173173
flash|gemini-2.5-flash|Large context reads, summarization, bulk processing
174-
sonnet|claude-sonnet-4|Code implementation, review, most development tasks
174+
sonnet|claude-sonnet-4-6|Code implementation, review, most development tasks
175175
pro|gemini-2.5-pro|Large codebase analysis, complex reasoning with big context
176-
opus|claude-opus-4|Architecture decisions, complex multi-step reasoning"
176+
opus|claude-opus-4-6|Architecture decisions, complex multi-step reasoning"
177177

178178
# =============================================================================
179179
# Task-to-Model Recommendations
180180
# =============================================================================
181181

182-
readonly TASK_RECOMMENDATIONS="code review|claude-sonnet-4|o4-mini|gemini-2.5-flash
183-
code implementation|claude-sonnet-4|gpt-4.1|gemini-2.5-pro
184-
architecture design|claude-opus-4|o3|gemini-2.5-pro
185-
bug fixing|claude-sonnet-4|gpt-4.1|o4-mini
186-
refactoring|claude-sonnet-4|gpt-4.1|gemini-2.5-pro
187-
documentation|claude-sonnet-4|gpt-4o|gemini-2.5-flash
188-
testing|claude-sonnet-4|gpt-4.1|o4-mini
189-
classification|claude-haiku-3.5|gpt-4.1-nano|gemini-2.5-flash
190-
summarization|gemini-2.5-flash|gpt-4o-mini|claude-haiku-3.5
191-
large codebase analysis|gemini-2.5-pro|gpt-4.1|claude-sonnet-4
182+
readonly TASK_RECOMMENDATIONS="code review|claude-sonnet-4-6|o4-mini|gemini-2.5-flash
183+
code implementation|claude-sonnet-4-6|gpt-4.1|gemini-2.5-pro
184+
architecture design|claude-opus-4-6|o3|gemini-2.5-pro
185+
bug fixing|claude-sonnet-4-6|gpt-4.1|o4-mini
186+
refactoring|claude-sonnet-4-6|gpt-4.1|gemini-2.5-pro
187+
documentation|claude-sonnet-4-6|gpt-4o|gemini-2.5-flash
188+
testing|claude-sonnet-4-6|gpt-4.1|o4-mini
189+
classification|claude-haiku-4-5|gpt-4.1-nano|gemini-2.5-flash
190+
summarization|gemini-2.5-flash|gpt-4o-mini|claude-haiku-4-5
191+
large codebase analysis|gemini-2.5-pro|gpt-4.1|claude-sonnet-4-6
192192
math reasoning|o3|deepseek-r1|gemini-2.5-pro
193-
security audit|claude-opus-4|o3|claude-sonnet-4
194-
data extraction|gemini-2.5-flash|gpt-4o-mini|claude-haiku-3.5
195-
commit messages|claude-haiku-3.5|gpt-4.1-nano|gemini-2.5-flash
196-
pr description|claude-sonnet-4|gpt-4o|gemini-2.5-flash"
193+
security audit|claude-opus-4-6|o3|claude-sonnet-4-6
194+
data extraction|gemini-2.5-flash|gpt-4o-mini|claude-haiku-4-5
195+
commit messages|claude-haiku-4-5|gpt-4.1-nano|gemini-2.5-flash
196+
pr description|claude-sonnet-4-6|gpt-4o|gemini-2.5-flash"
197197

198198
# =============================================================================
199199
# Helper Functions
@@ -480,8 +480,8 @@ cmd_recommend() {
480480
if [[ "$found" != "true" ]]; then
481481
echo "No exact task match. Showing general recommendations:"
482482
echo ""
483-
echo " High capability: claude-opus-4 or o3"
484-
echo " Balanced: claude-sonnet-4 or gpt-4.1"
483+
echo " High capability: claude-opus-4-6 or o3"
484+
echo " Balanced: claude-sonnet-4-6 or gpt-4.1"
485485
echo " Budget: gemini-2.5-flash or gpt-4.1-nano"
486486
echo " Large context: gemini-2.5-pro or gpt-4.1 (1M tokens)"
487487
echo ""
@@ -1003,9 +1003,9 @@ cmd_help() {
10031003
echo ""
10041004
echo "Scoring examples:"
10051005
echo " compare-models-helper.sh score --task 'fix React bug' --type code \\"
1006-
echo " --model claude-sonnet-4 --correctness 9 --completeness 8 --quality 8 --clarity 9 --adherence 9 \\"
1006+
echo " --model claude-sonnet-4-6 --correctness 9 --completeness 8 --quality 8 --clarity 9 --adherence 9 \\"
10071007
echo " --model gpt-4.1 --correctness 8 --completeness 7 --quality 7 --clarity 8 --adherence 8 \\"
1008-
echo " --winner claude-sonnet-4"
1008+
echo " --winner claude-sonnet-4-6"
10091009
echo " compare-models-helper.sh results"
10101010
echo " compare-models-helper.sh results --model sonnet --limit 5"
10111011
echo ""
@@ -1441,8 +1441,8 @@ SQL
14411441
}
14421442

14431443
# Record a comparison result
1444-
# Usage: cmd_score --task "description" --type "code" --evaluator "claude-opus-4" \
1445-
# --model "claude-sonnet-4" --correctness 9 --completeness 8 --quality 7 \
1444+
# Usage: cmd_score --task "description" --type "code" --evaluator "claude-opus-4-6" \
1445+
# --model "claude-sonnet-4-6" --correctness 9 --completeness 8 --quality 7 \
14461446
# --clarity 8 --adherence 9 --latency 1200 --tokens 500 \
14471447
# --strengths "Fast, accurate" --weaknesses "Verbose" \
14481448
# [--model "gpt-4.1" --correctness 8 ...]

.agents/scripts/generate-models-md.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -157,11 +157,11 @@ generate_routing_tiers() {
157157
sm.tier,
158158
sm.model_id,
159159
CASE sm.tier
160-
WHEN 'haiku' THEN '~0.25x'
160+
WHEN 'haiku' THEN '~0.33x'
161161
WHEN 'flash' THEN '~0.20x'
162162
WHEN 'sonnet' THEN '1x (baseline)'
163163
WHEN 'pro' THEN '~1.5x'
164-
WHEN 'opus' THEN '~3x'
164+
WHEN 'opus' THEN '~1.7x'
165165
ELSE '?'
166166
END
167167
FROM subagent_models sm

.agents/tools/ai-assistants/models/haiku.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
description: Lightweight model for triage, classification, and simple transforms
33
mode: subagent
4-
model: anthropic/claude-3-5-haiku-20241022
4+
model: anthropic/claude-haiku-4-5-20251001
55
model-tier: haiku
66
model-fallback: google/gemini-2.5-flash-preview-05-20
77
tools:
@@ -39,8 +39,9 @@ You are a lightweight, fast AI assistant optimized for simple tasks.
3939
| Field | Value |
4040
|-------|-------|
4141
| Provider | Anthropic |
42-
| Model | claude-3-5-haiku |
42+
| Model | claude-haiku-4-5 |
4343
| Context | 200K tokens |
44-
| Input cost | $0.80/1M tokens |
45-
| Output cost | $4.00/1M tokens |
44+
| Max output | 64K tokens |
45+
| Input cost | $1.00/1M tokens |
46+
| Output cost | $5.00/1M tokens |
4647
| Tier | haiku (lowest cost) |

.agents/tools/ai-assistants/models/opus.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
description: Highest-capability model for architecture decisions, novel problems, and complex multi-step reasoning
33
mode: subagent
4-
model: anthropic/claude-opus-4-20250514
4+
model: anthropic/claude-opus-4-6
55
model-tier: opus
66
model-fallback: openai/o3
77
fallback-chain:
@@ -37,16 +37,17 @@ You are the highest-capability AI assistant, reserved for the most complex and c
3737

3838
- Only use this tier when the task genuinely requires it
3939
- Most coding tasks are better served by sonnet tier
40-
- Cost is approximately 3x sonnet -- justify the spend
40+
- Cost is approximately 1.7x sonnet -- justify the spend
4141
- If the task is primarily about large context, use pro tier instead
4242

4343
## Model Details
4444

4545
| Field | Value |
4646
|-------|-------|
4747
| Provider | Anthropic |
48-
| Model | claude-opus-4 |
49-
| Context | 200K tokens |
50-
| Input cost | $15.00/1M tokens |
51-
| Output cost | $75.00/1M tokens |
48+
| Model | claude-opus-4-6 |
49+
| Context | 200K tokens (1M beta) |
50+
| Max output | 128K tokens |
51+
| Input cost | $5.00/1M tokens |
52+
| Output cost | $25.00/1M tokens |
5253
| Tier | opus (highest capability, highest cost) |

MODELS.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,24 +3,24 @@
33
Live performance data from pattern-tracker and response-scoring databases.
44
Auto-generated by `generate-models-md.sh` — do not edit manually.
55

6-
**Last updated**: 2026-02-18T16:40:00Z
6+
**Last updated**: 2026-02-18T17:35:03Z
77

8-
- **Pattern data points**: 884
8+
- **Pattern data points**: 902
99
- **Scored responses**: 18
1010
- **Date range**: 2026-02-05 to 2026-02-18
1111

1212
## Available Models
1313

1414
| Model | Provider | Tier | Context | Input/1M | Output/1M |
1515
|-------|----------|------|---------|----------|-----------|
16-
| claude-opus-4 | Anthropic | opus | 200K | $15.00 | $75.00 |
16+
| claude-opus-4-6 | Anthropic | opus | 200K | $5.00 | $25.00 |
1717
| o3 | OpenAI | opus | 200K | $10.00 | $40.00 |
18-
| claude-sonnet-4 | Anthropic | sonnet | 200K | $3.00 | $15.00 |
18+
| claude-sonnet-4-6 | Anthropic | sonnet | 200K | $3.00 | $15.00 |
1919
| gemini-2.5-pro | Google | sonnet | 1M | $1.25 | $10.00 |
2020
| gpt-4.1 | OpenAI | sonnet | 1M | $2.00 | $8.00 |
2121
| gpt-4o | OpenAI | sonnet | 128K | $2.50 | $10.00 |
2222
| o4-mini | OpenAI | sonnet | 200K | $1.10 | $4.40 |
23-
| claude-haiku-3.5 | Anthropic | haiku | 200K | $0.80 | $4.00 |
23+
| claude-haiku-4-5 | Anthropic | haiku | 200K | $1.00 | $5.00 |
2424
| deepseek-r1 | DeepSeek | haiku | 131K | $0.55 | $2.19 |
2525
| deepseek-v3 | DeepSeek | haiku | 131K | $0.27 | $1.10 |
2626
| gemini-2.0-flash | Google | haiku | 1M | $0.10 | $0.40 |
@@ -37,29 +37,29 @@ Active model assignments for each dispatch tier:
3737

3838
| Tier | Primary Model | Relative Cost |
3939
|------|---------------|---------------|
40-
| haiku | claude-3-5-haiku | ~0.25x |
40+
| haiku | claude-haiku-4-5 | ~0.33x |
4141
| flash | gemini-2.5-flash-preview-05-20 | ~0.20x |
42-
| sonnet | claude-sonnet-4 | 1x (baseline) |
42+
| sonnet | claude-sonnet-4-6 | 1x (baseline) |
4343
| pro | gemini-2.5-pro-preview-06-05 | ~1.5x |
44-
| opus | claude-opus-4 | ~3x |
44+
| opus | claude-opus-4-6 | ~1.7x |
4545

4646
## Performance Leaderboard
4747

4848
Success rates from autonomous task execution (pattern-tracker data):
4949

5050
| Model | Tasks | Successes | Failures | Success Rate | Last Used |
5151
|-------|-------|-----------|----------|--------------|-----------|
52-
| opus | 507 | 499 | 8 | 98% | 2026-02-18 |
53-
| sonnet | 163 | 163 | 0 | 100% | 2026-02-18 |
52+
| opus | 512 | 504 | 8 | 98% | 2026-02-18 |
53+
| sonnet | 175 | 175 | 0 | 100% | 2026-02-18 |
5454
| pro | 2 | 2 | 0 | 100% | 2026-02-18 |
5555
| haiku | 1 | 0 | 1 | 0% | 2026-02-05 |
5656

5757
### By Task Type
5858

5959
| Task Type | Tasks | Successes | Failures | Success Rate |
6060
|-----------|-------|-----------|----------|--------------|
61-
| feature | 541 | 522 | 19 | 96% |
62-
| bugfix | 10 | 7 | 3 | 70% |
61+
| feature | 551 | 532 | 19 | 96% |
62+
| bugfix | 17 | 14 | 3 | 82% |
6363
| refactor | 1 | 0 | 1 | 0% |
6464
| testing | 1 | 1 | 0 | 100% |
6565
| security | 1 | 1 | 0 | 100% |

TODO.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1542,7 +1542,7 @@ t019.3.4,Update AGENTS.md with Beads integration docs,,beads,1h,45m,2025-12-21T1
15421542

15431543
- [x] t1127 Fix create_improvement action type recognition in executor #bugfix #auto-dispatch #self-improvement ~15m model:haiku category:automation — The supervisor executor rejects 'create_improvement' as 'invalid type' — 3 skips in last 5 cycles. The AI prompt defines create_improvement as a valid action type, but the executor's action type whitelist doesn't include it. Add 'create_improvement' to the executor's recognized action types, mapping it to create_task with the additional #self-improvement tag and category metadata. ref:GH#1686 assignee:marcusquinn started:2026-02-18T16:49:43Z verified:2026-02-18 pr:#1650
15441544

1545-
- [ ] t1128 Update model registry with current model IDs #bugfix #auto-dispatch ~30m model:sonnet category:data — The model-registry.db has stale entries: `claude-opus-4` should be `claude-opus-4-6`, `claude-sonnet-4` should include dated variants like `claude-sonnet-4-20250514`, and newer models may be missing entirely. The `generate-models-md.sh` script produces MODELS.md from this registry, so outdated source data propagates to all repos. Steps: (1) audit current model IDs against provider APIs/docs, (2) update model-registry.db entries, (3) re-run generate-models-md.sh, (4) verify MODELS.md output is accurate. ref:GH#1690 assignee:marcusquinn started:2026-02-18T17:01:09Z logged:2026-02-18
1545+
- [x] t1128 Update model registry with current model IDs #bugfix #auto-dispatch ~30m model:sonnet category:data — The model-registry.db has stale entries: `claude-opus-4` should be `claude-opus-4-6`, `claude-sonnet-4` should include dated variants like `claude-sonnet-4-20250514`, and newer models may be missing entirely. The `generate-models-md.sh` script produces MODELS.md from this registry, so outdated source data propagates to all repos. Steps: (1) audit current model IDs against provider APIs/docs, (2) update model-registry.db entries, (3) re-run generate-models-md.sh, (4) verify MODELS.md output is accurate. ref:GH#1690 assignee:marcusquinn started:2026-02-18T17:01:09Z logged:2026-02-18 pr:#1712 completed:2026-02-18
15461546

15471547
- [ ] t1129 Include MODELS.md in aidevops init workflow for per-repo performance tracking #feature #auto-dispatch ~2h model:sonnet category:observability — Currently `generate-models-md.sh` produces a global MODELS.md from the pattern-tracker and response-scoring databases. This is useful for per-repo tracking of which models perform best on that repo's task types. Steps: (1) update `aidevops init` to generate MODELS.md in the project root, (2) add MODELS.md to the list of files tracked by git (not gitignored), (3) add a periodic refresh mechanism (e.g., during supervisor pulse or on commit) so the data stays current, (4) consider filtering pattern data by repo path so each repo's MODELS.md reflects its own task history rather than global stats. ref:GH#1691 assignee:marcusquinn started:2026-02-18T17:02:03Z logged:2026-02-18
15481548

0 commit comments

Comments
 (0)