Skip to content

fix(llm): map HTTP 413 to ContextLengthExceeded for auto-compaction#2339

Merged
serrrfirat merged 6 commits intostagingfrom
fix/2276-http-413-handling
Apr 15, 2026
Merged

fix(llm): map HTTP 413 to ContextLengthExceeded for auto-compaction#2339
serrrfirat merged 6 commits intostagingfrom
fix/2276-http-413-handling

Conversation

@zmanian
Copy link
Copy Markdown
Collaborator

@zmanian zmanian commented Apr 11, 2026

Summary

  • Maps HTTP 413 (Payload Too Large) to LlmError::ContextLengthExceeded instead of generic RequestFailed
  • Detects context-length errors in HTTP 400 response bodies (OpenAI-compatible endpoints)
  • Adds map_rig_error() helper in rig_adapter to detect context overflow patterns from all providers

Closes #2276

The Bug

When accumulated context exceeded the provider's payload limit:

  1. nearai_chat.rs returned RequestFailed (a transient error)
  2. RetryProvider retried the same oversized payload 3x with exponential backoff
  3. CircuitBreaker counted it toward the trip threshold (5 failures = 30s blackout)
  4. FailoverProvider tried the next provider with the same too-large payload
  5. dispatcher.rs had recovery code for ContextLengthExceeded that triggers compaction, but it was never reached

The Fix

Map 413 to ContextLengthExceeded so the existing recovery path kicks in:

  • ContextLengthExceeded is non-retryable (won't waste retries)
  • It doesn't trip the circuit breaker (it's a client problem, not backend degradation)
  • The dispatcher auto-compacts context and retries with a smaller window

Files changed

File Change
src/llm/nearai_chat.rs Explicit 413 check + context-length detection in 400 bodies
src/llm/rig_adapter.rs map_rig_error() helper + 5 regression tests

Test plan

  • 5 unit tests for map_rig_error() (context_length_exceeded, maximum context length, too many tokens, 413, generic error)
  • cargo clippy zero warnings
  • Manual: accumulate large context via repeated tool calls, verify compaction triggers instead of crash

…2276)

HTTP 413 (Payload Too Large) was falling through to generic RequestFailed,
causing the retry provider to retry the same oversized payload 3x, count
toward the circuit breaker threshold, and fail over to other providers
with the same too-large context. The existing compaction recovery in
dispatcher.rs (which handles ContextLengthExceeded) was never reached.

Fix:
- nearai_chat.rs: explicit 413 check → ContextLengthExceeded
- nearai_chat.rs: detect context length errors in 400 response bodies
- rig_adapter.rs: map_rig_error() detects context length patterns in
  error messages from OpenAI/Anthropic/Ollama providers

Now when context exceeds provider limits, the dispatcher automatically
triggers compaction and retries with a smaller context window.

Closes #2276

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size: M 50-199 changed lines scope: llm LLM integration risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels Apr 11, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves error handling for LLM providers by mapping HTTP 413 status codes and specific error message patterns to the ContextLengthExceeded error variant. This logic is implemented in nearai_chat.rs and centralized in a new map_rig_error helper function within rig_adapter.rs, accompanied by unit tests. The review feedback recommends using to_ascii_lowercase() and the any() iterator for more idiomatic and readable pattern matching.

Comment thread src/llm/nearai_chat.rs Outdated
Comment on lines +312 to +319
let lower = response_text.to_lowercase();
if lower.contains("context_length_exceeded")
|| lower.contains("maximum context length")
|| lower.contains("too many tokens")
|| lower.contains("payload too large")
{
return Err(LlmError::ContextLengthExceeded { used: 0, limit: 0 });
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use to_ascii_lowercase() for case-insensitive comparisons as per project convention. Additionally, using any() with a slice of patterns is more idiomatic and readable than chained || operators, and ensures multi-word phrases are correctly matched via substring containment.

Suggested change
let lower = response_text.to_lowercase();
if lower.contains("context_length_exceeded")
|| lower.contains("maximum context length")
|| lower.contains("too many tokens")
|| lower.contains("payload too large")
{
return Err(LlmError::ContextLengthExceeded { used: 0, limit: 0 });
}
let lower = response_text.to_ascii_lowercase();
if [
"context_length_exceeded",
"maximum context length",
"too many tokens",
"payload too large",
]
.iter()
.any(|p| lower.contains(p))
{
return Err(LlmError::ContextLengthExceeded { used: 0, limit: 0 });
}
References
  1. For case-insensitive comparisons, use to_ascii_lowercase() as it is the project convention, especially when the text is known to be ASCII.
  2. When checking for hints in user messages, prefer substring containment over token-based matching if hints can be multi-word phrases.

Comment thread src/llm/rig_adapter.rs Outdated
Comment on lines +1173 to +1181
let lower = msg.to_lowercase();
if lower.contains("context_length_exceeded")
|| lower.contains("maximum context length")
|| lower.contains("too many tokens")
|| lower.contains("payload too large")
|| lower.contains("413")
{
return LlmError::ContextLengthExceeded { used: 0, limit: 0 };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use to_ascii_lowercase() for case-insensitive comparisons as per project convention. Additionally, using any() with a slice of patterns is more idiomatic and readable than chained || operators, and ensures multi-word phrases are correctly matched via substring containment.

Suggested change
let lower = msg.to_lowercase();
if lower.contains("context_length_exceeded")
|| lower.contains("maximum context length")
|| lower.contains("too many tokens")
|| lower.contains("payload too large")
|| lower.contains("413")
{
return LlmError::ContextLengthExceeded { used: 0, limit: 0 };
}
let lower = msg.to_ascii_lowercase();
if [
"context_length_exceeded",
"maximum context length",
"too many tokens",
"payload too large",
"413",
]
.iter()
.any(|p| lower.contains(p))
{
return LlmError::ContextLengthExceeded { used: 0, limit: 0 };
}
References
  1. For case-insensitive comparisons, use to_ascii_lowercase() as it is the project convention, especially when the text is known to be ASCII.
  2. When checking for hints in user messages, prefer substring containment over token-based matching if hints can be multi-word phrases.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26aa554f98

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/llm/rig_adapter.rs Outdated
|| lower.contains("maximum context length")
|| lower.contains("too many tokens")
|| lower.contains("payload too large")
|| lower.contains("413")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict 413 matching to actual status-code errors

map_rig_error currently classifies any error text containing "413" as ContextLengthExceeded, but this can match unrelated failures (for example request IDs, token counts, or other numeric fields in transient network/provider errors). In that case the error is downgraded to a non-retryable class, so retry/failover logic is skipped and recoverable requests fail immediately. Please match a structured/explicit 413 signal (e.g., parsed HTTP status or "HTTP 413") instead of a bare numeric substring.

Useful? React with 👍 / 👎.

claude added 2 commits April 11, 2026 17:56
Run cargo fmt on rig_adapter.rs (two chain-expression reflows) and add
RUSTSEC-2026-0097 (rand 0.8.x unsoundness) to deny.toml ignore list.

https://claude.ai/code/session_01VidPyvxYesocfhH1bYJP5Y
The 4 wasmtime advisories (RUSTSEC-2025-0046, RUSTSEC-2025-0118,
RUSTSEC-2026-0020, RUSTSEC-2026-0021) no longer match any crate in the
lockfile after the v43 upgrade and were generating advisory-not-detected
warnings.

https://claude.ai/code/session_01MMhMuxXvAXTcFZ3EAga12k
Comment thread src/llm/rig_adapter.rs Outdated
|| lower.contains("maximum context length")
|| lower.contains("too many tokens")
|| lower.contains("payload too large")
|| lower.contains("413")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — Bare "413" substring match causes false positives

map_rig_error() matches the bare substring "413" in the lowercased error message. This will false-positive on any error containing "413" in a non-HTTP-status context, e.g.:

  • Rate limit messages: "Rate limit: 413 requests per minute exceeded"
  • Timestamps: "2026-04-13"
  • Token counts: "used 1413 tokens"
  • Model names or request IDs containing "413"

Unlike nearai_chat.rs which checks the actual HTTP status code == 413, this is string-matching against serialized errors from rig-core which can contain arbitrary upstream text. The "payload too large" keyword already covers legitimate 413 responses.

Suggested fix: Remove the bare "413" check (since "payload too large" covers it), or tighten to "status: 413" / "http 413" / "error 413".

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: removed the bare "413" match entirely. The "payload too large" pattern already covers legitimate 413 responses. Added regression tests confirming that bare "413" in rate-limit messages and timestamps does not false-positive. Also switched to to_ascii_lowercase() and idiomatic any() pattern.

Comment thread src/llm/nearai_chat.rs Outdated
// request size limit. Map to ContextLengthExceeded so the dispatcher
// can trigger automatic compaction instead of crashing.
if status_code == 413 {
return Err(LlmError::ContextLengthExceeded { used: 0, limit: 0 });
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — used: 0, limit: 0 loses actual token counts

Both ContextLengthExceeded emissions use used: 0, limit: 0, discarding the actual token counts. The dispatcher logs these values at WARN level (used, limit, iteration) and at ERROR when retry fails. Operators diagnosing context overflow issues in production will see used=0, limit=0 — useless for capacity planning and debugging. The error display also renders as "Context length exceeded: 0 tokens used, 0 allowed."

Many providers include token counts in their error JSON (e.g., OpenAI: "This model's maximum context length is 128000 tokens. However, your messages resulted in 150000 tokens.").

Suggested fix: Parse used and limit from the response body when possible. If parsing fails, keep 0 but add a comment explaining why the values are zeroed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: both nearai_chat.rs paths (HTTP 413 and HTTP 400) now attempt to parse used/limit token counts from the error response body via the shared parse_token_counts() helper. Falls back to (0, 0) when the response does not contain parseable numbers. Also applied to_ascii_lowercase() and any() pattern.

Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paranoid Architect Review — REQUEST CHANGES

2 Medium findings.

The core fix is correct and well-designed: mapping HTTP 413 and context-length errors to ContextLengthExceeded breaks the retry/circuit-breaker/failover chain for a non-retryable condition and enables the compaction recovery path. The rig_adapter tests are thorough.

However:

  • Medium: The bare "413" substring match in map_rig_error will false-positive on timestamps, token counts, rate limit messages, and any error text containing "413" in a non-status context. Since "payload too large" already covers legitimate 413 responses, the "413" check adds risk without clear value.
  • Medium: used: 0, limit: 0 discards actual token counts, degrading production observability. Many providers include these in their error JSON.

Neither is a hard blocker, but the "413" false positive risk is worth fixing before merge — it could incorrectly trigger compaction instead of propagating the real error.

…from errors

Address review feedback on PR #2339:
- Remove bare "413" substring match from map_rig_error() to prevent
  false positives on timestamps, token counts, and request IDs.
  The "payload too large" pattern already covers legitimate 413 errors.
- Parse used/limit token counts from error messages when providers
  include them (e.g. OpenAI's "maximum context length is X tokens...
  resulted in Y tokens" format), instead of always returning 0/0.
- Use to_ascii_lowercase() and idiomatic slice-based any() pattern
  per project convention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size: L 200-499 changed lines and removed size: M 50-199 changed lines labels Apr 13, 2026
claude added 2 commits April 13, 2026 11:30
Collapse multi-line let bindings for parse_token_counts() calls onto
single lines, matching rustfmt expectations.

https://claude.ai/code/session_01NSKpFprVJLtoyAaVs4FXio
Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved — Paranoid Architect Review

The core fix is correct and well-tested. Previous CHANGES_REQUESTED concern (bare "413" false positives) has been addressed — map_rig_error only matches semantic patterns ("context_length_exceeded", "maximum context length", "too many tokens", "payload too large"), and test_map_rig_error_bare_413_no_false_positive confirms no false positives on timestamps or rate limit messages.

Findings summary:

  • Medium (follow-up): 4 other direct-HTTP providers (github_copilot, anthropic_oauth, gemini_oauth, openai_codex_provider) have the same bug. Tracked in #2489.
  • Nit: CONTEXT_PATTERNS duplicated between nearai_chat.rs and rig_adapter.rs — consolidation tracked in the same issue.

CI green, no merge conflicts. Ship it.

This was referenced Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: low Changes to docs, tests, or low-risk modules scope: dependencies Dependency updates scope: llm LLM integration size: L 200-499 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[QA] Orchestrator crashes with HTTP 413 Payload Too Large from nearai_chat provider

3 participants