Agent SDK should handle 429 rate limits gracefully instead of crashing
Repository: https://github.com/anthropics/claude-agent-sdk-python/issues
Title
Agent SDK crashes on 429 rate limit instead of backing off and retrying
Description
The Agent SDK (claude-agent-sdk Python package) crashes fatally when the API returns a 429 rate limit error, even though this is a transient and recoverable condition. The SDK knows the rate limits, has access to the response headers showing current usage, and controls the request cadence — yet it treats the 429 as a fatal exception rather than backing off and retrying.
This means multi-turn autonomous agent sessions that accumulate 10+ minutes of work are destroyed by a single rate limit hit, losing all progress.
Environment
claude-agent-sdk (Python, latest via pip)
- macOS (Apple Silicon)
- Python 3.14
- Model:
claude-sonnet-4-6
- Console API key (pay-as-you-go tier, 30,000 input tokens/minute limit)
Reproduction
- Create an agent that reads multiple large files (10-25KB each) via MCP tools over 15-20 turns
- The accumulated context grows past 30k input tokens per minute
- The SDK sends the next API request, which exceeds the rate limit
- Instead of waiting and retrying, the SDK raises a fatal exception:
Exception: Command failed with exit code 1 (exit code: 1)
Error output: Check stderr output for details
The 429 response body is visible in the agent output:
API Error: 429 {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization's rate limit of 30,000 input tokens per minute..."}}
The agent session terminates. All progress from previous turns (which successfully wrote to external systems via MCP tools) is orphaned.
Expected Behaviour
The SDK should:
- Catch 429 responses internally — these are transient, not fatal
- Read the
Retry-After header (or use a sensible default like 60s)
- Wait and retry automatically — the agent's accumulated context is still valid
- Optionally expose a callback or event so the caller can log the backoff, but the default should be silent retry
This is standard practice for any API client library. The SDK already manages the full agent loop, tool execution, and context compaction — rate limit handling is a natural part of that responsibility.
Workaround Attempted
Added asyncio.sleep(2.0) between turns and try/except with retry logic around the query() generator. The inter-turn delay helps with turn frequency but does not prevent crashes when a single turn's accumulated context exceeds the per-minute token limit. The retry logic does not fire because the SDK raises the exception internally before the caller's try/except can intercept it.
Impact
On a 30k tokens/minute rate limit, any agent session reading files larger than ~10KB will reliably crash within 10-15 minutes. This makes the SDK unusable for autonomous multi-turn agents on the standard Console tier without manual intervention.
The SDK is Anthropic's own client library talking to Anthropic's own API with Anthropic's own rate limits. All three are under Anthropic's control. The client should not crash against limits it can query, predict, and respect.
Additional Context
Three separate agent runs over 2 hours, all crashed at the same point — immediately after a large file read that pushed the accumulated context past the rate limit. The agent had completed 80-90% of its work in each case. Total cost of the three crashed runs: ~2.5 dollars for work that should have cost ~0.5 dollars in a single successful run.
Haiku (claude-haiku-4-5-20251001) on the same task completes successfully because its smaller context window stays under the rate limit. The irony: the cheaper model is more reliable than the expensive one because the SDK can't handle the rate limits that only the larger model triggers.
Agent SDK should handle 429 rate limits gracefully instead of crashing
Repository: https://github.com/anthropics/claude-agent-sdk-python/issues
Title
Agent SDK crashes on 429 rate limit instead of backing off and retrying
Description
The Agent SDK (
claude-agent-sdkPython package) crashes fatally when the API returns a 429 rate limit error, even though this is a transient and recoverable condition. The SDK knows the rate limits, has access to the response headers showing current usage, and controls the request cadence — yet it treats the 429 as a fatal exception rather than backing off and retrying.This means multi-turn autonomous agent sessions that accumulate 10+ minutes of work are destroyed by a single rate limit hit, losing all progress.
Environment
claude-agent-sdk(Python, latest via pip)claude-sonnet-4-6Reproduction
The 429 response body is visible in the agent output:
The agent session terminates. All progress from previous turns (which successfully wrote to external systems via MCP tools) is orphaned.
Expected Behaviour
The SDK should:
Retry-Afterheader (or use a sensible default like 60s)This is standard practice for any API client library. The SDK already manages the full agent loop, tool execution, and context compaction — rate limit handling is a natural part of that responsibility.
Workaround Attempted
Added
asyncio.sleep(2.0)between turns and try/except with retry logic around thequery()generator. The inter-turn delay helps with turn frequency but does not prevent crashes when a single turn's accumulated context exceeds the per-minute token limit. The retry logic does not fire because the SDK raises the exception internally before the caller's try/except can intercept it.Impact
On a 30k tokens/minute rate limit, any agent session reading files larger than ~10KB will reliably crash within 10-15 minutes. This makes the SDK unusable for autonomous multi-turn agents on the standard Console tier without manual intervention.
The SDK is Anthropic's own client library talking to Anthropic's own API with Anthropic's own rate limits. All three are under Anthropic's control. The client should not crash against limits it can query, predict, and respect.
Additional Context
Three separate agent runs over 2 hours, all crashed at the same point — immediately after a large file read that pushed the accumulated context past the rate limit. The agent had completed 80-90% of its work in each case. Total cost of the three crashed runs: ~2.5 dollars for work that should have cost ~0.5 dollars in a single successful run.
Haiku (
claude-haiku-4-5-20251001) on the same task completes successfully because its smaller context window stays under the rate limit. The irony: the cheaper model is more reliable than the expensive one because the SDK can't handle the rate limits that only the larger model triggers.