t1338.1: Extend model-routing.md with local tier#2326
Conversation
Add local as tier 0 (/bin/zsh cost) in the model routing hierarchy: local -> haiku -> flash -> sonnet -> pro -> opus Changes to model-routing.md: - Model Tiers table: local tier with llama.cpp GGUF - Routing Rules: new 'Use local when' section with privacy/offline/bulk criteria - Cost Estimation: local at 0x (electricity only) - Model-Specific Subagents: local -> tools/local-models/local-models.md - Fallback Routing: local falls back to haiku when server not running - Decision Flowchart: local-first branch before cloud routing - Examples: 3 local tier examples (bulk, offline, privacy) - Provider Discovery: local-model-helper.sh status/models commands - Subagent Frontmatter: local added to valid model values - Related: links to future local-models subagents and helper script Closes #2320
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates support for local AI models into the existing model routing system. It introduces a new "local" tier, enabling users to leverage on-device llama.cpp models for tasks requiring privacy, offline capability, or bulk processing, with a clear fallback to cloud models when local options are unavailable. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThis PR extends the model-routing.md documentation to include a new "local" tier for on-device AI model inference via llama.cpp. It adds routing rules, cost estimation, decision flowchart paths, examples, and helper command references for local model management alongside existing cloud-based tiers. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Feb 25 23:11:16 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Code Review
This pull request provides a comprehensive documentation update to model-routing.md, introducing the new local model tier. The changes are well-structured and cover all necessary aspects, including routing rules, cost, and fallback mechanisms. I've identified one inconsistency regarding the model cost ordering and have a suggestion to improve the clarity of the limitations section. Overall, this is a great enhancement to the documentation.
| - **Purpose**: Route tasks to the cheapest model that can handle them well | ||
| - **Philosophy**: Use the smallest model that produces acceptable quality | ||
| - **Default**: sonnet (best balance of cost/capability for most tasks) | ||
| - **Cost spectrum**: local (free) -> haiku -> flash -> sonnet -> pro -> opus (highest) |
There was a problem hiding this comment.
There's an inconsistency in the model cost ordering. The 'Cost spectrum' line lists haiku as cheaper than flash. However, the 'Cost Estimation' table below indicates that flash (~0.20x) is more cost-effective than haiku (~0.25x). To ensure consistency and prevent confusion, the order in the cost spectrum should be updated to reflect the data in the table.
| - **Cost spectrum**: local (free) -> haiku -> flash -> sonnet -> pro -> opus (highest) | |
| - **Cost spectrum**: local (free) -> flash -> haiku -> sonnet -> pro -> opus (highest) |
| - Simple tasks where network latency exceeds local inference time | ||
| - The task fits within the local model's capability (typically <32K context, simpler reasoning) | ||
|
|
||
| **Limitations**: Local models are smaller and less capable than cloud models. Do not route complex reasoning, large-context analysis, or architecture decisions to local. If a local model is not running or not installed, skip to `haiku`. |
There was a problem hiding this comment.
The 'Limitations' section currently mixes model capability constraints with the fallback mechanism. To improve clarity, I suggest separating these. This makes a clear distinction between guidance (when not to use local) and automatic behavior (what happens if local is unavailable).
| **Limitations**: Local models are smaller and less capable than cloud models. Do not route complex reasoning, large-context analysis, or architecture decisions to local. If a local model is not running or not installed, skip to `haiku`. | |
| **Limitations**: Local models are smaller and less capable than cloud models. Do not route complex reasoning, large-context analysis, or architecture decisions to local. | |
| If a local model is not running or not installed, the request automatically falls back to `haiku`. |
🤖 Augment PR SummarySummary: Extends Changes:
Technical Notes: Keeps 🤖 Was this summary useful? React with 👍 or 👎 |
| ``` | ||
|
|
||
| Valid values: `haiku`, `flash`, `sonnet`, `pro`, `opus` | ||
| Valid values: `local`, `haiku`, `flash`, `sonnet`, `pro`, `opus` |
There was a problem hiding this comment.
This doc now states model: local is a valid tier, but several scripts appear to validate/resolve tiers against a fixed set that currently excludes local (e.g., tier checks in model-availability-helper.sh and model-label-helper.sh). That mismatch could cause routing/labeling failures if someone starts using local immediately based on this guidance.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
|
|
||
| | Tier | Subagent | Primary Model | Fallback | | ||
| |------|----------|---------------|----------| | ||
| | `local` | `tools/local-models/local-models.md` | llama.cpp (user GGUF) | haiku | |
There was a problem hiding this comment.
tools/local-models/local-models.md (and related local-model docs) are referenced here, but those files don’t exist in the repo yet, so this will be a broken reference for readers until the follow-up subtasks land. Consider marking these as “coming in t1338.x” or deferring the references until the files are added.
Severity: medium
Other Locations
.agents/tools/context/model-routing.md:300.agents/tools/context/model-routing.md:301
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
|
|
||
| Discovery checks three sources (in order): environment variables, gopass encrypted secrets, plaintext `credentials.sh`. Use discovery output to constrain routing to models the user can actually access. | ||
|
|
||
| For local models, use `local-model-helper.sh status` to check if a local model server is running: |
There was a problem hiding this comment.
This section instructs running local-model-helper.sh, but that helper script is not present in the repo yet, so the command is currently non-actionable. Consider explicitly noting it’s planned for a later subtask to avoid confusing users following the doc today.
Severity: medium
Other Locations
.agents/tools/context/model-routing.md:302
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.agents/tools/context/model-routing.md (1)
121-126: Clarify subagent location scope to avoid path confusion.The sentence on Line 121 says model subagents are defined under
tools/ai-assistants/models/, but the newlocalentry on Line 125 points totools/local-models/local-models.md. Update wording to “primarily” or “across these paths” so the docs stay operationally accurate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/tools/context/model-routing.md around lines 121 - 126, The sentence stating that concrete model subagents are defined under tools/ai-assistants/models/ is too narrow given the new local entry pointing to tools/local-models/local-models.md; update the wording (the paragraph introducing the table) to indicate subagents are "primarily defined under tools/ai-assistants/models/" or "defined across these paths (e.g., tools/ai-assistants/models/ and tools/local-models/)" so the table entries like the local subagent and the haiku entry are accurately covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/tools/context/model-routing.md:
- Around line 40-50: The policy currently allows silent fallback from the
"local" route to "haiku", which violates on-device/privacy constraints; change
the wording and logic so that when a task requires on-device handling ("local")
and the local model is not running or installed, the system must fail closed and
not auto-route to "haiku" or any cloud option without an explicit override;
update the text replacing "skip to `haiku`" with a clear fail-closed statement
and add a requirement that any escalation to `haiku` (or cloud) must be
performed only after an explicit, auditable override, and mirror this change in
the routing flowchart references to "local" -> "haiku".
---
Nitpick comments:
In @.agents/tools/context/model-routing.md:
- Around line 121-126: The sentence stating that concrete model subagents are
defined under tools/ai-assistants/models/ is too narrow given the new local
entry pointing to tools/local-models/local-models.md; update the wording (the
paragraph introducing the table) to indicate subagents are "primarily defined
under tools/ai-assistants/models/" or "defined across these paths (e.g.,
tools/ai-assistants/models/ and tools/local-models/)" so the table entries like
the local subagent and the haiku entry are accurately covered.
…review) - Local route no longer silently falls back to cloud haiku when the task requires on-device processing (privacy, compliance, air-gapped). It now fails with an error unless --allow-cloud is explicitly passed. - Cost-optimisation and experimentation use cases still fall back to haiku. - Updated decision flowchart, fallback table, and subagent table to reflect the fail-closed/fail-open distinction. - Updated subagent table intro to cover both tools/ai-assistants/models/ (cloud) and tools/local-models/ (local) paths.
|
@coderabbitai review Addressed your feedback:
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Feb 25 23:18:22 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
🧩 Analysis chain🏁 Script executed: Length of output: 13109
The documentation is now internally consistent across all sections (routing rules, flowchart, fallback table, and examples). The fail-closed approach for privacy-sensitive tasks is a best practice that prevents accidental data leakage to cloud providers. Triggering a full review now. 🧠 Learnings used✅ Actions performedReview triggered.
|
- Fix cost spectrum ordering: flash (0.20x) < haiku (0.25x) per cost table (Gemini) - Reorder all tables/sections to match: flash before haiku consistently - Update local fallback from haiku to flash (cheapest cloud by blended cost) - Add (planned: t1338.x) annotations to not-yet-existing files/scripts (Augment) - Add note that local tier runtime support is coming in t1338.2-t1338.6 (Augment) - Separate limitations from fallback was already done in prior commit (Gemini)
|
@coderabbitai review Addressed all remaining reviewer feedback:
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Feb 25 23:30:42 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
Triggering a full review now to verify all changes are consistent across the documentation. 🧠 Learnings used✅ Actions performedReview triggered.
|
|
|
@coderabbitai review Both CodeRabbit review items are already addressed in commits a19eadb and 71fddb9:
All CI checks pass. Requesting re-review to clear the CHANGES_REQUESTED state. |
🧩 Analysis chain🏁 Script executed: Length of output: 13457
All documentation is internally consistent. Triggering review to clear the CHANGES_REQUESTED state. 🧠 Learnings used✅ Actions performedReview triggered.
|
|
@coderabbitai review All review feedback has been addressed across commits a19eadb and 71fddb9:
markdownlint: 0 errors. All CI checks pass. Requesting fresh review to clear stale CHANGES_REQUESTED state. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.agents/tools/context/model-routing.md (1)
53-53:--allow-cloudis referenced but has no definition anchor or planned-subtask annotation.Both Line 53 and Line 250 cite
--allow-cloudas the escape hatch for the fail-closed privacy path, but there is no in-document definition of what accepts this flag (CLI? dispatch.sh? subagent invocation?), nor a(planned: t1338.x)annotation to indicate where it will be implemented — unlike the other planned artefacts (local-model-helper.sh,local-models.md, etc.) that are consistently annotated throughout. A brief parenthetical pointing to the implementing subtask would keep the doc internally consistent.✏️ Proposed annotation
-- **Privacy/on-device requirement**: FAIL — do not route to cloud. Return an error instructing the user to start the local server or pass `--allow-cloud` to explicitly override. +- **Privacy/on-device requirement**: FAIL — do not route to cloud. Return an error instructing the user to start the local server or pass `--allow-cloud` to explicitly override (planned: t1338.x).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/tools/context/model-routing.md at line 53, Add a short in-document definition/anchor for the CLI flag `--allow-cloud` and annotate it with the planned subtask (e.g., `(planned: t1338.x)`) so readers know where it will be implemented; update the two occurrences that reference `--allow-cloud` (the privacy fail message and the line at 250) to either link to that anchor or include the parenthetical planned-subtask note, ensuring the flag name `--allow-cloud` is consistently defined and annotated like other artifacts (`local-model-helper.sh`, `local-models.md`).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/tools/context/model-routing.md:
- Around line 107-110: The runtime-support note for the "local" tier incorrectly
states it will fall back to "haiku"; update that sentence to say it will fall
back to "flash" so it matches the rest of the document (the "Fall back to
`flash`" line, the Fallback Routing table, and the cost spectrum). Locate the
note text that mentions "`local` in frontmatter will fall back to `haiku`" and
replace "haiku" with "flash" to align the policy and tables.
---
Nitpick comments:
In @.agents/tools/context/model-routing.md:
- Line 53: Add a short in-document definition/anchor for the CLI flag
`--allow-cloud` and annotate it with the planned subtask (e.g., `(planned:
t1338.x)`) so readers know where it will be implemented; update the two
occurrences that reference `--allow-cloud` (the privacy fail message and the
line at 250) to either link to that anchor or include the parenthetical
planned-subtask note, ensuring the flag name `--allow-cloud` is consistently
defined and annotated like other artifacts (`local-model-helper.sh`,
`local-models.md`).
| Valid values: `local`, `haiku`, `flash`, `sonnet`, `pro`, `opus` | ||
|
|
||
| > **Note**: The `local` tier is documented here but runtime support (helper scripts, availability checks) is being added in subtasks t1338.2-t1338.6. Until those land, `local` in frontmatter will fall back to `haiku`. | ||
|
|
There was a problem hiding this comment.
haiku fallback in the runtime-support note contradicts the flash fallback defined everywhere else.
Line 109 says local frontmatter will fall back to haiku until the runtime subtasks land. However, the rest of the document consistently targets flash as the cost-optimisation fallback — Line 54 ("Fall back to flash"), the Fallback Routing table (Line 183: flash (cost-only)), and the cost spectrum itself (Line 25) which places flash below haiku in cost (~0.20x vs ~0.25x). The note should align with the established policy.
✏️ Proposed fix
-> **Note**: The `local` tier is documented here but runtime support (helper scripts, availability checks) is being added in subtasks t1338.2-t1338.6. Until those land, `local` in frontmatter will fall back to `haiku`.
+> **Note**: The `local` tier is documented here but runtime support (helper scripts, availability checks) is being added in subtasks t1338.2-t1338.6. Until those land, `local` in frontmatter will fall back to `flash`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/tools/context/model-routing.md around lines 107 - 110, The
runtime-support note for the "local" tier incorrectly states it will fall back
to "haiku"; update that sentence to say it will fall back to "flash" so it
matches the rest of the document (the "Fall back to `flash`" line, the Fallback
Routing table, and the cost spectrum). Locate the note text that mentions
"`local` in frontmatter will fall back to `haiku`" and replace "haiku" with
"flash" to align the policy and tables.
🧩 Analysis chain🏁 Script executed: Length of output: 13457
The documentation is internally consistent across all sections. Triggering review to clear the CHANGES_REQUESTED state. 🧠 Learnings used✅ Actions performedReview triggered.
|
- Fix cost spectrum order: flash (~0.20x) is cheaper than haiku (~0.25x), so the spectrum now reads local -> flash -> haiku -> sonnet -> pro -> opus - Add `local` tier to VALID_MODELS in model-label-helper.sh so local-tier tasks can be labelled and queried without falling through to the unknown model warning - Add `local` to is_known_tier() and get_tier_models() in model-availability-helper.sh so routing/availability checks accept the local tier without returning exit 1 - Broken references to tools/local-models/ and local-model-helper.sh are now resolved: those files were created in t1338.2-6 after PR #2326 merged Closes #3299
- Fix cost spectrum order: flash (~0.20x) is cheaper than haiku (~0.25x), so the spectrum now reads local -> flash -> haiku -> sonnet -> pro -> opus - Add `local` tier to VALID_MODELS in model-label-helper.sh so local-tier tasks can be labelled and queried without falling through to the unknown model warning - Add `local` to is_known_tier() and get_tier_models() in model-availability-helper.sh so routing/availability checks accept the local tier without returning exit 1 - Broken references to tools/local-models/ and local-model-helper.sh are now resolved: those files were created in t1338.2-6 after PR #2326 merged Closes #3299



Summary
localas tier 0 ($0 cost) in the model routing hierarchy:local -> haiku -> flash -> sonnet -> pro -> opushaikuwhen local server is not running or no model is installedChanges to
model-routing.mdlocalrow: llama.cpp, Free ($0)Use local whensection (privacy, offline, bulk, experimentation)localat 0x input/output ($0 electricity only)local->tools/local-models/local-models.md, fallback: haikulocal-model-helper.sh statusandmodelscommandslocal-> haiku when server not runninglocaladded to valid model valuesVerification
Context
Part of plan p032 (Local AI Model Support). This is subtask 1 of 6 — documentation-only change to model-routing.md. Future subtasks (t1338.2-t1338.6) will create the local-models subagents, helper script, and usage logging.
Closes #2320
Summary by CodeRabbit