fix(llm): map HTTP 413 to ContextLengthExceeded for auto-compaction by zmanian · Pull Request #2339 · nearai/ironclaw

zmanian · 2026-04-11T16:59:49Z

Summary

Maps HTTP 413 (Payload Too Large) to LlmError::ContextLengthExceeded instead of generic RequestFailed
Detects context-length errors in HTTP 400 response bodies (OpenAI-compatible endpoints)
Adds map_rig_error() helper in rig_adapter to detect context overflow patterns from all providers

Closes #2276

The Bug

When accumulated context exceeded the provider's payload limit:

nearai_chat.rs returned RequestFailed (a transient error)
RetryProvider retried the same oversized payload 3x with exponential backoff
CircuitBreaker counted it toward the trip threshold (5 failures = 30s blackout)
FailoverProvider tried the next provider with the same too-large payload
dispatcher.rs had recovery code for ContextLengthExceeded that triggers compaction, but it was never reached

The Fix

Map 413 to ContextLengthExceeded so the existing recovery path kicks in:

ContextLengthExceeded is non-retryable (won't waste retries)
It doesn't trip the circuit breaker (it's a client problem, not backend degradation)
The dispatcher auto-compacts context and retries with a smaller window

Files changed

File	Change
`src/llm/nearai_chat.rs`	Explicit 413 check + context-length detection in 400 bodies
`src/llm/rig_adapter.rs`	`map_rig_error()` helper + 5 regression tests

Test plan

5 unit tests for map_rig_error() (context_length_exceeded, maximum context length, too many tokens, 413, generic error)
cargo clippy zero warnings
Manual: accumulate large context via repeated tool calls, verify compaction triggers instead of crash

…2276) HTTP 413 (Payload Too Large) was falling through to generic RequestFailed, causing the retry provider to retry the same oversized payload 3x, count toward the circuit breaker threshold, and fail over to other providers with the same too-large context. The existing compaction recovery in dispatcher.rs (which handles ContextLengthExceeded) was never reached. Fix: - nearai_chat.rs: explicit 413 check → ContextLengthExceeded - nearai_chat.rs: detect context length errors in 400 response bodies - rig_adapter.rs: map_rig_error() detects context length patterns in error messages from OpenAI/Anthropic/Ollama providers Now when context exceeds provider limits, the dispatcher automatically triggers compaction and retries with a smaller context window. Closes #2276 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request improves error handling for LLM providers by mapping HTTP 413 status codes and specific error message patterns to the ContextLengthExceeded error variant. This logic is implemented in nearai_chat.rs and centralized in a new map_rig_error helper function within rig_adapter.rs, accompanied by unit tests. The review feedback recommends using to_ascii_lowercase() and the any() iterator for more idiomatic and readable pattern matching.

gemini-code-assist · 2026-04-11T17:02:40Z

+                let lower = response_text.to_lowercase();
+                if lower.contains("context_length_exceeded")
+                    || lower.contains("maximum context length")
+                    || lower.contains("too many tokens")
+                    || lower.contains("payload too large")
+                {
+                    return Err(LlmError::ContextLengthExceeded { used: 0, limit: 0 });
+                }


Use to_ascii_lowercase() for case-insensitive comparisons as per project convention. Additionally, using any() with a slice of patterns is more idiomatic and readable than chained || operators, and ensures multi-word phrases are correctly matched via substring containment.

Suggested change

let lower = response_text.to_lowercase();

if lower.contains("context_length_exceeded")

|| lower.contains("maximum context length")

|| lower.contains("too many tokens")

|| lower.contains("payload too large")

{

return Err(LlmError::ContextLengthExceeded { used: 0, limit: 0 });

}

let lower = response_text.to_ascii_lowercase();

if [

"context_length_exceeded",

"maximum context length",

"too many tokens",

"payload too large",

]

.iter()

.any(|p| lower.contains(p))

{

return Err(LlmError::ContextLengthExceeded { used: 0, limit: 0 });

}

References

For case-insensitive comparisons, use to_ascii_lowercase() as it is the project convention, especially when the text is known to be ASCII.

When checking for hints in user messages, prefer substring containment over token-based matching if hints can be multi-word phrases.

gemini-code-assist · 2026-04-11T17:02:40Z

+    let lower = msg.to_lowercase();
+    if lower.contains("context_length_exceeded")
+        || lower.contains("maximum context length")
+        || lower.contains("too many tokens")
+        || lower.contains("payload too large")
+        || lower.contains("413")
+    {
+        return LlmError::ContextLengthExceeded { used: 0, limit: 0 };
+    }


Use to_ascii_lowercase() for case-insensitive comparisons as per project convention. Additionally, using any() with a slice of patterns is more idiomatic and readable than chained || operators, and ensures multi-word phrases are correctly matched via substring containment.

Suggested change

let lower = msg.to_lowercase();

if lower.contains("context_length_exceeded")

|| lower.contains("maximum context length")

|| lower.contains("too many tokens")

|| lower.contains("payload too large")

|| lower.contains("413")

{

return LlmError::ContextLengthExceeded { used: 0, limit: 0 };

}

let lower = msg.to_ascii_lowercase();

if [

"context_length_exceeded",

"maximum context length",

"too many tokens",

"payload too large",

"413",

]

.iter()

.any(|p| lower.contains(p))

{

return LlmError::ContextLengthExceeded { used: 0, limit: 0 };

}

References

For case-insensitive comparisons, use to_ascii_lowercase() as it is the project convention, especially when the text is known to be ASCII.

When checking for hints in user messages, prefer substring containment over token-based matching if hints can be multi-word phrases.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26aa554f98

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-11T17:03:03Z

+        || lower.contains("maximum context length")
+        || lower.contains("too many tokens")
+        || lower.contains("payload too large")
+        || lower.contains("413")


Restrict 413 matching to actual status-code errors

map_rig_error currently classifies any error text containing "413" as ContextLengthExceeded, but this can match unrelated failures (for example request IDs, token counts, or other numeric fields in transient network/provider errors). In that case the error is downgraded to a non-retryable class, so retry/failover logic is skipped and recoverable requests fail immediately. Please match a structured/explicit 413 signal (e.g., parsed HTTP status or "HTTP 413") instead of a bare numeric substring.

Useful? React with 👍 / 👎.

Run cargo fmt on rig_adapter.rs (two chain-expression reflows) and add RUSTSEC-2026-0097 (rand 0.8.x unsoundness) to deny.toml ignore list. https://claude.ai/code/session_01VidPyvxYesocfhH1bYJP5Y

The 4 wasmtime advisories (RUSTSEC-2025-0046, RUSTSEC-2025-0118, RUSTSEC-2026-0020, RUSTSEC-2026-0021) no longer match any crate in the lockfile after the v43 upgrade and were generating advisory-not-detected warnings. https://claude.ai/code/session_01MMhMuxXvAXTcFZ3EAga12k

serrrfirat · 2026-04-13T07:56:06Z

+        || lower.contains("maximum context length")
+        || lower.contains("too many tokens")
+        || lower.contains("payload too large")
+        || lower.contains("413")


Medium — Bare "413" substring match causes false positives

map_rig_error() matches the bare substring "413" in the lowercased error message. This will false-positive on any error containing "413" in a non-HTTP-status context, e.g.:

Rate limit messages: "Rate limit: 413 requests per minute exceeded"

Timestamps: "2026-04-13"

Token counts: "used 1413 tokens"

Model names or request IDs containing "413"

Unlike nearai_chat.rs which checks the actual HTTP status code == 413, this is string-matching against serialized errors from rig-core which can contain arbitrary upstream text. The "payload too large" keyword already covers legitimate 413 responses.

Suggested fix: Remove the bare "413" check (since "payload too large" covers it), or tighten to "status: 413" / "http 413" / "error 413".

Addressed: removed the bare "413" match entirely. The "payload too large" pattern already covers legitimate 413 responses. Added regression tests confirming that bare "413" in rate-limit messages and timestamps does not false-positive. Also switched to to_ascii_lowercase() and idiomatic any() pattern.

serrrfirat · 2026-04-13T07:56:13Z

+            // request size limit. Map to ContextLengthExceeded so the dispatcher
+            // can trigger automatic compaction instead of crashing.
+            if status_code == 413 {
+                return Err(LlmError::ContextLengthExceeded { used: 0, limit: 0 });


Medium — used: 0, limit: 0 loses actual token counts

Both ContextLengthExceeded emissions use used: 0, limit: 0, discarding the actual token counts. The dispatcher logs these values at WARN level (used, limit, iteration) and at ERROR when retry fails. Operators diagnosing context overflow issues in production will see used=0, limit=0 — useless for capacity planning and debugging. The error display also renders as "Context length exceeded: 0 tokens used, 0 allowed."

Many providers include token counts in their error JSON (e.g., OpenAI: "This model's maximum context length is 128000 tokens. However, your messages resulted in 150000 tokens.").

Suggested fix: Parse used and limit from the response body when possible. If parsing fails, keep 0 but add a comment explaining why the values are zeroed.

Addressed: both nearai_chat.rs paths (HTTP 413 and HTTP 400) now attempt to parse used/limit token counts from the error response body via the shared parse_token_counts() helper. Falls back to (0, 0) when the response does not contain parseable numbers. Also applied to_ascii_lowercase() and any() pattern.

serrrfirat

Paranoid Architect Review — REQUEST CHANGES

2 Medium findings.

The core fix is correct and well-designed: mapping HTTP 413 and context-length errors to ContextLengthExceeded breaks the retry/circuit-breaker/failover chain for a non-retryable condition and enables the compaction recovery path. The rig_adapter tests are thorough.

However:

Medium: The bare "413" substring match in map_rig_error will false-positive on timestamps, token counts, rate limit messages, and any error text containing "413" in a non-status context. Since "payload too large" already covers legitimate 413 responses, the "413" check adds risk without clear value.
Medium: used: 0, limit: 0 discards actual token counts, degrading production observability. Many providers include these in their error JSON.

Neither is a hard blocker, but the "413" false positive risk is worth fixing before merge — it could incorrectly trigger compaction instead of propagating the real error.

…from errors Address review feedback on PR #2339: - Remove bare "413" substring match from map_rig_error() to prevent false positives on timestamps, token counts, and request IDs. The "payload too large" pattern already covers legitimate 413 errors. - Parse used/limit token counts from error messages when providers include them (e.g. OpenAI's "maximum context length is X tokens... resulted in Y tokens" format), instead of always returning 0/0. - Use to_ascii_lowercase() and idiomatic slice-based any() pattern per project convention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Collapse multi-line let bindings for parse_token_counts() calls onto single lines, matching rustfmt expectations. https://claude.ai/code/session_01NSKpFprVJLtoyAaVs4FXio

gimli 0.33.1 was yanked from crates.io, causing cargo-deny to fail. https://claude.ai/code/session_01CfQr8EtFrqnrjseuVeGJ13

serrrfirat

Approved — Paranoid Architect Review

The core fix is correct and well-tested. Previous CHANGES_REQUESTED concern (bare "413" false positives) has been addressed — map_rig_error only matches semantic patterns ("context_length_exceeded", "maximum context length", "too many tokens", "payload too large"), and test_map_rig_error_bare_413_no_false_positive confirms no false positives on timestamps or rate limit messages.

Findings summary:

Medium (follow-up): 4 other direct-HTTP providers (github_copilot, anthropic_oauth, gemini_oauth, openai_codex_provider) have the same bug. Tracked in #2489.
Nit: CONTEXT_PATTERNS duplicated between nearai_chat.rs and rig_adapter.rs — consolidation tracked in the same issue.

CI green, no merge conflicts. Ship it.

github-actions bot added size: M 50-199 changed lines scope: llm LLM integration risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels Apr 11, 2026

gemini-code-assist bot reviewed Apr 11, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 11, 2026

View reviewed changes

claude added 2 commits April 11, 2026 17:56

fix(ci): resolve formatting and cargo-deny failures

648d468

Run cargo fmt on rig_adapter.rs (two chain-expression reflows) and add RUSTSEC-2026-0097 (rand 0.8.x unsoundness) to deny.toml ignore list. https://claude.ai/code/session_01VidPyvxYesocfhH1bYJP5Y

zmanian requested review from ilblackdragon and serrrfirat April 13, 2026 00:31

serrrfirat reviewed Apr 13, 2026

View reviewed changes

serrrfirat requested changes Apr 13, 2026

View reviewed changes

github-actions bot added size: L 200-499 changed lines and removed size: M 50-199 changed lines labels Apr 13, 2026

claude added 2 commits April 13, 2026 11:30

fix(llm): resolve cargo fmt violations in nearai_chat.rs

85a8624

Collapse multi-line let bindings for parse_token_counts() calls onto single lines, matching rustfmt expectations. https://claude.ai/code/session_01NSKpFprVJLtoyAaVs4FXio

fix(deps): update gimli 0.33.1 -> 0.33.0 to resolve yanked crate

d1eb6fa

gimli 0.33.1 was yanked from crates.io, causing cargo-deny to fail. https://claude.ai/code/session_01CfQr8EtFrqnrjseuVeGJ13

github-actions bot added the scope: dependencies Dependency updates label Apr 13, 2026

zmanian requested a review from serrrfirat April 14, 2026 07:50

serrrfirat mentioned this pull request Apr 15, 2026

fix(llm): add HTTP 413 / context-length detection to all direct-HTTP providers #2489

Open

serrrfirat approved these changes Apr 15, 2026

View reviewed changes

serrrfirat merged commit ae1f698 into staging Apr 15, 2026
15 checks passed

serrrfirat deleted the fix/2276-http-413-handling branch April 15, 2026 09:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): map HTTP 413 to ContextLengthExceeded for auto-compaction#2339

fix(llm): map HTTP 413 to ContextLengthExceeded for auto-compaction#2339
serrrfirat merged 6 commits intostagingfrom
fix/2276-http-413-handling

zmanian commented Apr 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 11, 2026

Uh oh!

gemini-code-assist bot Apr 11, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Uh oh!

serrrfirat Apr 13, 2026

Uh oh!

zmanian Apr 13, 2026

Uh oh!

serrrfirat Apr 13, 2026

Uh oh!

zmanian Apr 13, 2026

Uh oh!

serrrfirat left a comment

Uh oh!

serrrfirat left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zmanian commented Apr 11, 2026

Summary

The Bug

The Fix

Files changed

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

serrrfirat Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

zmanian Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

serrrfirat Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

zmanian Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

serrrfirat left a comment

Choose a reason for hiding this comment

Paranoid Architect Review — REQUEST CHANGES

Uh oh!

serrrfirat left a comment

Choose a reason for hiding this comment

Approved — Paranoid Architect Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants