Skip to content

Agent SDK should handle 429 rate limits gracefully instead of crashing #812

@chris-skeels

Description

@chris-skeels

Agent SDK should handle 429 rate limits gracefully instead of crashing

Repository: https://github.com/anthropics/claude-agent-sdk-python/issues


Title

Agent SDK crashes on 429 rate limit instead of backing off and retrying

Description

The Agent SDK (claude-agent-sdk Python package) crashes fatally when the API returns a 429 rate limit error, even though this is a transient and recoverable condition. The SDK knows the rate limits, has access to the response headers showing current usage, and controls the request cadence — yet it treats the 429 as a fatal exception rather than backing off and retrying.

This means multi-turn autonomous agent sessions that accumulate 10+ minutes of work are destroyed by a single rate limit hit, losing all progress.

Environment

  • claude-agent-sdk (Python, latest via pip)
  • macOS (Apple Silicon)
  • Python 3.14
  • Model: claude-sonnet-4-6
  • Console API key (pay-as-you-go tier, 30,000 input tokens/minute limit)

Reproduction

  1. Create an agent that reads multiple large files (10-25KB each) via MCP tools over 15-20 turns
  2. The accumulated context grows past 30k input tokens per minute
  3. The SDK sends the next API request, which exceeds the rate limit
  4. Instead of waiting and retrying, the SDK raises a fatal exception:
Exception: Command failed with exit code 1 (exit code: 1)
Error output: Check stderr output for details

The 429 response body is visible in the agent output:

API Error: 429 {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization's rate limit of 30,000 input tokens per minute..."}}

The agent session terminates. All progress from previous turns (which successfully wrote to external systems via MCP tools) is orphaned.

Expected Behaviour

The SDK should:

  1. Catch 429 responses internally — these are transient, not fatal
  2. Read the Retry-After header (or use a sensible default like 60s)
  3. Wait and retry automatically — the agent's accumulated context is still valid
  4. Optionally expose a callback or event so the caller can log the backoff, but the default should be silent retry

This is standard practice for any API client library. The SDK already manages the full agent loop, tool execution, and context compaction — rate limit handling is a natural part of that responsibility.

Workaround Attempted

Added asyncio.sleep(2.0) between turns and try/except with retry logic around the query() generator. The inter-turn delay helps with turn frequency but does not prevent crashes when a single turn's accumulated context exceeds the per-minute token limit. The retry logic does not fire because the SDK raises the exception internally before the caller's try/except can intercept it.

Impact

On a 30k tokens/minute rate limit, any agent session reading files larger than ~10KB will reliably crash within 10-15 minutes. This makes the SDK unusable for autonomous multi-turn agents on the standard Console tier without manual intervention.

The SDK is Anthropic's own client library talking to Anthropic's own API with Anthropic's own rate limits. All three are under Anthropic's control. The client should not crash against limits it can query, predict, and respect.

Additional Context

Three separate agent runs over 2 hours, all crashed at the same point — immediately after a large file read that pushed the accumulated context past the rate limit. The agent had completed 80-90% of its work in each case. Total cost of the three crashed runs: ~2.5 dollars for work that should have cost ~0.5 dollars in a single successful run.

Haiku (claude-haiku-4-5-20251001) on the same task completes successfully because its smaller context window stays under the rate limit. The irony: the cheaper model is more reliable than the expensive one because the SDK can't handle the rate limits that only the larger model triggers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions