Agent SDK should handle 429 rate limits gracefully instead of crashing

# Agent SDK should handle 429 rate limits gracefully instead of crashing

**Repository:** https://github.com/anthropics/claude-agent-sdk-python/issues

---

## Title

Agent SDK crashes on 429 rate limit instead of backing off and retrying

## Description

The Agent SDK (`claude-agent-sdk` Python package) crashes fatally when the API returns a 429 rate limit error, even though this is a transient and recoverable condition. The SDK knows the rate limits, has access to the response headers showing current usage, and controls the request cadence — yet it treats the 429 as a fatal exception rather than backing off and retrying.

This means multi-turn autonomous agent sessions that accumulate 10+ minutes of work are destroyed by a single rate limit hit, losing all progress.

## Environment

- `claude-agent-sdk` (Python, latest via pip)
- macOS (Apple Silicon)
- Python 3.14
- Model: `claude-sonnet-4-6`
- Console API key (pay-as-you-go tier, 30,000 input tokens/minute limit)

## Reproduction

1. Create an agent that reads multiple large files (10-25KB each) via MCP tools over 15-20 turns
2. The accumulated context grows past 30k input tokens per minute
3. The SDK sends the next API request, which exceeds the rate limit
4. Instead of waiting and retrying, the SDK raises a fatal exception:

```
Exception: Command failed with exit code 1 (exit code: 1)
Error output: Check stderr output for details
```

The 429 response body is visible in the agent output:
```
API Error: 429 {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization's rate limit of 30,000 input tokens per minute..."}}
```

The agent session terminates. All progress from previous turns (which successfully wrote to external systems via MCP tools) is orphaned.

## Expected Behaviour

The SDK should:

1. **Catch 429 responses internally** — these are transient, not fatal
2. **Read the `Retry-After` header** (or use a sensible default like 60s)
3. **Wait and retry automatically** — the agent's accumulated context is still valid
4. **Optionally expose a callback or event** so the caller can log the backoff, but the default should be silent retry

This is standard practice for any API client library. The SDK already manages the full agent loop, tool execution, and context compaction — rate limit handling is a natural part of that responsibility.

## Workaround Attempted

Added `asyncio.sleep(2.0)` between turns and try/except with retry logic around the `query()` generator. The inter-turn delay helps with turn frequency but does not prevent crashes when a single turn's accumulated context exceeds the per-minute token limit. The retry logic does not fire because the SDK raises the exception internally before the caller's try/except can intercept it.

## Impact

On a 30k tokens/minute rate limit, any agent session reading files larger than ~10KB will reliably crash within 10-15 minutes. This makes the SDK unusable for autonomous multi-turn agents on the standard Console tier without manual intervention.

The SDK is Anthropic's own client library talking to Anthropic's own API with Anthropic's own rate limits. All three are under Anthropic's control. The client should not crash against limits it can query, predict, and respect.

## Additional Context

Three separate agent runs over 2 hours, all crashed at the same point — immediately after a large file read that pushed the accumulated context past the rate limit. The agent had completed 80-90% of its work in each case. Total cost of the three crashed runs: ~2.5 dollars for work that should have cost ~0.5 dollars in a single successful run.

Haiku (`claude-haiku-4-5-20251001`) on the same task completes successfully because its smaller context window stays under the rate limit. The irony: the cheaper model is more reliable than the expensive one because the SDK can't handle the rate limits that only the larger model triggers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent SDK should handle 429 rate limits gracefully instead of crashing #812

Agent SDK should handle 429 rate limits gracefully instead of crashing

Title

Description

Environment

Reproduction

Expected Behaviour

Workaround Attempted

Impact

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Agent SDK should handle 429 rate limits gracefully instead of crashing #812

Description

Agent SDK should handle 429 rate limits gracefully instead of crashing

Title

Description

Environment

Reproduction

Expected Behaviour

Workaround Attempted

Impact

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions