fix(llm): map HTTP 413 to ContextLengthExceeded for auto-compaction#2339
fix(llm): map HTTP 413 to ContextLengthExceeded for auto-compaction#2339serrrfirat merged 6 commits intostagingfrom
Conversation
…2276) HTTP 413 (Payload Too Large) was falling through to generic RequestFailed, causing the retry provider to retry the same oversized payload 3x, count toward the circuit breaker threshold, and fail over to other providers with the same too-large context. The existing compaction recovery in dispatcher.rs (which handles ContextLengthExceeded) was never reached. Fix: - nearai_chat.rs: explicit 413 check → ContextLengthExceeded - nearai_chat.rs: detect context length errors in 400 response bodies - rig_adapter.rs: map_rig_error() detects context length patterns in error messages from OpenAI/Anthropic/Ollama providers Now when context exceeds provider limits, the dispatcher automatically triggers compaction and retries with a smaller context window. Closes #2276 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request improves error handling for LLM providers by mapping HTTP 413 status codes and specific error message patterns to the ContextLengthExceeded error variant. This logic is implemented in nearai_chat.rs and centralized in a new map_rig_error helper function within rig_adapter.rs, accompanied by unit tests. The review feedback recommends using to_ascii_lowercase() and the any() iterator for more idiomatic and readable pattern matching.
| let lower = response_text.to_lowercase(); | ||
| if lower.contains("context_length_exceeded") | ||
| || lower.contains("maximum context length") | ||
| || lower.contains("too many tokens") | ||
| || lower.contains("payload too large") | ||
| { | ||
| return Err(LlmError::ContextLengthExceeded { used: 0, limit: 0 }); | ||
| } |
There was a problem hiding this comment.
Use to_ascii_lowercase() for case-insensitive comparisons as per project convention. Additionally, using any() with a slice of patterns is more idiomatic and readable than chained || operators, and ensures multi-word phrases are correctly matched via substring containment.
| let lower = response_text.to_lowercase(); | |
| if lower.contains("context_length_exceeded") | |
| || lower.contains("maximum context length") | |
| || lower.contains("too many tokens") | |
| || lower.contains("payload too large") | |
| { | |
| return Err(LlmError::ContextLengthExceeded { used: 0, limit: 0 }); | |
| } | |
| let lower = response_text.to_ascii_lowercase(); | |
| if [ | |
| "context_length_exceeded", | |
| "maximum context length", | |
| "too many tokens", | |
| "payload too large", | |
| ] | |
| .iter() | |
| .any(|p| lower.contains(p)) | |
| { | |
| return Err(LlmError::ContextLengthExceeded { used: 0, limit: 0 }); | |
| } |
References
- For case-insensitive comparisons, use
to_ascii_lowercase()as it is the project convention, especially when the text is known to be ASCII. - When checking for hints in user messages, prefer substring containment over token-based matching if hints can be multi-word phrases.
| let lower = msg.to_lowercase(); | ||
| if lower.contains("context_length_exceeded") | ||
| || lower.contains("maximum context length") | ||
| || lower.contains("too many tokens") | ||
| || lower.contains("payload too large") | ||
| || lower.contains("413") | ||
| { | ||
| return LlmError::ContextLengthExceeded { used: 0, limit: 0 }; | ||
| } |
There was a problem hiding this comment.
Use to_ascii_lowercase() for case-insensitive comparisons as per project convention. Additionally, using any() with a slice of patterns is more idiomatic and readable than chained || operators, and ensures multi-word phrases are correctly matched via substring containment.
| let lower = msg.to_lowercase(); | |
| if lower.contains("context_length_exceeded") | |
| || lower.contains("maximum context length") | |
| || lower.contains("too many tokens") | |
| || lower.contains("payload too large") | |
| || lower.contains("413") | |
| { | |
| return LlmError::ContextLengthExceeded { used: 0, limit: 0 }; | |
| } | |
| let lower = msg.to_ascii_lowercase(); | |
| if [ | |
| "context_length_exceeded", | |
| "maximum context length", | |
| "too many tokens", | |
| "payload too large", | |
| "413", | |
| ] | |
| .iter() | |
| .any(|p| lower.contains(p)) | |
| { | |
| return LlmError::ContextLengthExceeded { used: 0, limit: 0 }; | |
| } |
References
- For case-insensitive comparisons, use
to_ascii_lowercase()as it is the project convention, especially when the text is known to be ASCII. - When checking for hints in user messages, prefer substring containment over token-based matching if hints can be multi-word phrases.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26aa554f98
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| || lower.contains("maximum context length") | ||
| || lower.contains("too many tokens") | ||
| || lower.contains("payload too large") | ||
| || lower.contains("413") |
There was a problem hiding this comment.
Restrict 413 matching to actual status-code errors
map_rig_error currently classifies any error text containing "413" as ContextLengthExceeded, but this can match unrelated failures (for example request IDs, token counts, or other numeric fields in transient network/provider errors). In that case the error is downgraded to a non-retryable class, so retry/failover logic is skipped and recoverable requests fail immediately. Please match a structured/explicit 413 signal (e.g., parsed HTTP status or "HTTP 413") instead of a bare numeric substring.
Useful? React with 👍 / 👎.
Run cargo fmt on rig_adapter.rs (two chain-expression reflows) and add RUSTSEC-2026-0097 (rand 0.8.x unsoundness) to deny.toml ignore list. https://claude.ai/code/session_01VidPyvxYesocfhH1bYJP5Y
The 4 wasmtime advisories (RUSTSEC-2025-0046, RUSTSEC-2025-0118, RUSTSEC-2026-0020, RUSTSEC-2026-0021) no longer match any crate in the lockfile after the v43 upgrade and were generating advisory-not-detected warnings. https://claude.ai/code/session_01MMhMuxXvAXTcFZ3EAga12k
| || lower.contains("maximum context length") | ||
| || lower.contains("too many tokens") | ||
| || lower.contains("payload too large") | ||
| || lower.contains("413") |
There was a problem hiding this comment.
Medium — Bare "413" substring match causes false positives
map_rig_error() matches the bare substring "413" in the lowercased error message. This will false-positive on any error containing "413" in a non-HTTP-status context, e.g.:
- Rate limit messages:
"Rate limit: 413 requests per minute exceeded" - Timestamps:
"2026-04-13" - Token counts:
"used 1413 tokens" - Model names or request IDs containing "413"
Unlike nearai_chat.rs which checks the actual HTTP status code == 413, this is string-matching against serialized errors from rig-core which can contain arbitrary upstream text. The "payload too large" keyword already covers legitimate 413 responses.
Suggested fix: Remove the bare "413" check (since "payload too large" covers it), or tighten to "status: 413" / "http 413" / "error 413".
There was a problem hiding this comment.
Addressed: removed the bare "413" match entirely. The "payload too large" pattern already covers legitimate 413 responses. Added regression tests confirming that bare "413" in rate-limit messages and timestamps does not false-positive. Also switched to to_ascii_lowercase() and idiomatic any() pattern.
| // request size limit. Map to ContextLengthExceeded so the dispatcher | ||
| // can trigger automatic compaction instead of crashing. | ||
| if status_code == 413 { | ||
| return Err(LlmError::ContextLengthExceeded { used: 0, limit: 0 }); |
There was a problem hiding this comment.
Medium — used: 0, limit: 0 loses actual token counts
Both ContextLengthExceeded emissions use used: 0, limit: 0, discarding the actual token counts. The dispatcher logs these values at WARN level (used, limit, iteration) and at ERROR when retry fails. Operators diagnosing context overflow issues in production will see used=0, limit=0 — useless for capacity planning and debugging. The error display also renders as "Context length exceeded: 0 tokens used, 0 allowed."
Many providers include token counts in their error JSON (e.g., OpenAI: "This model's maximum context length is 128000 tokens. However, your messages resulted in 150000 tokens.").
Suggested fix: Parse used and limit from the response body when possible. If parsing fails, keep 0 but add a comment explaining why the values are zeroed.
There was a problem hiding this comment.
Addressed: both nearai_chat.rs paths (HTTP 413 and HTTP 400) now attempt to parse used/limit token counts from the error response body via the shared parse_token_counts() helper. Falls back to (0, 0) when the response does not contain parseable numbers. Also applied to_ascii_lowercase() and any() pattern.
serrrfirat
left a comment
There was a problem hiding this comment.
Paranoid Architect Review — REQUEST CHANGES
2 Medium findings.
The core fix is correct and well-designed: mapping HTTP 413 and context-length errors to ContextLengthExceeded breaks the retry/circuit-breaker/failover chain for a non-retryable condition and enables the compaction recovery path. The rig_adapter tests are thorough.
However:
- Medium: The bare
"413"substring match inmap_rig_errorwill false-positive on timestamps, token counts, rate limit messages, and any error text containing "413" in a non-status context. Since"payload too large"already covers legitimate 413 responses, the"413"check adds risk without clear value. - Medium:
used: 0, limit: 0discards actual token counts, degrading production observability. Many providers include these in their error JSON.
Neither is a hard blocker, but the "413" false positive risk is worth fixing before merge — it could incorrectly trigger compaction instead of propagating the real error.
…from errors Address review feedback on PR #2339: - Remove bare "413" substring match from map_rig_error() to prevent false positives on timestamps, token counts, and request IDs. The "payload too large" pattern already covers legitimate 413 errors. - Parse used/limit token counts from error messages when providers include them (e.g. OpenAI's "maximum context length is X tokens... resulted in Y tokens" format), instead of always returning 0/0. - Use to_ascii_lowercase() and idiomatic slice-based any() pattern per project convention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collapse multi-line let bindings for parse_token_counts() calls onto single lines, matching rustfmt expectations. https://claude.ai/code/session_01NSKpFprVJLtoyAaVs4FXio
gimli 0.33.1 was yanked from crates.io, causing cargo-deny to fail. https://claude.ai/code/session_01CfQr8EtFrqnrjseuVeGJ13
serrrfirat
left a comment
There was a problem hiding this comment.
Approved — Paranoid Architect Review
The core fix is correct and well-tested. Previous CHANGES_REQUESTED concern (bare "413" false positives) has been addressed — map_rig_error only matches semantic patterns ("context_length_exceeded", "maximum context length", "too many tokens", "payload too large"), and test_map_rig_error_bare_413_no_false_positive confirms no false positives on timestamps or rate limit messages.
Findings summary:
- Medium (follow-up): 4 other direct-HTTP providers (
github_copilot,anthropic_oauth,gemini_oauth,openai_codex_provider) have the same bug. Tracked in #2489. - Nit:
CONTEXT_PATTERNSduplicated betweennearai_chat.rsandrig_adapter.rs— consolidation tracked in the same issue.
CI green, no merge conflicts. Ship it.
Summary
LlmError::ContextLengthExceededinstead of genericRequestFailedmap_rig_error()helper in rig_adapter to detect context overflow patterns from all providersCloses #2276
The Bug
When accumulated context exceeded the provider's payload limit:
RequestFailed(a transient error)ContextLengthExceededthat triggers compaction, but it was never reachedThe Fix
Map 413 to
ContextLengthExceededso the existing recovery path kicks in:ContextLengthExceededis non-retryable (won't waste retries)Files changed
src/llm/nearai_chat.rssrc/llm/rig_adapter.rsmap_rig_error()helper + 5 regression testsTest plan
map_rig_error()(context_length_exceeded, maximum context length, too many tokens, 413, generic error)cargo clippyzero warnings