API Reference

Status: Active
Last validated: 2026-03-03
Related docs: README.md, clients.md, docker.md, zg.md, ../index.md

The proxy runs on http://localhost:8741 by default.

For cross-client operational guidance (Claude Code, OpenCode, OpenAI-compatible, Anthropic-compatible), see the Client Operations Guide.

Protocol Endpoints

OpenAI-compatible

Chat Completions:

curl http://localhost:8741/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-flash",
    "messages": [{"role": "user", "content": "hello"}],
    "stream": true
  }'

Responses API:

curl http://localhost:8741/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "opus-4.6",
    "input": "explain quantum computing"
  }'

Anthropic-compatible

Messages API:

curl http://localhost:8741/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "opus-4.6",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "hello"}]
  }'

Gemini-compatible

curl http://localhost:8741/v1beta/models/gemini-3-flash:generateContent \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "hello"}]}]
  }'

Models

curl http://localhost:8741/v1/models

Returns the list of available built-in models.

Model Aliases

Map custom model names to any built-in model. Requests using an alias are transparently rewritten to the target model.

Three ways to configure:

CLI (recommended): zg alias set gpt-4o gemini-3-flash
JSON file: aliases.json in the config directory (same location as accounts.json)
Env var: ZEROGRAVITY_MODEL_ALIASES=gpt-4o:gemini-3-flash,gpt-4:opus-4.6

JSON file takes precedence, env var overrides. Restart the daemon after changes.

Images

When a model generates an image, it is automatically saved and served:

GET http://localhost:8741/v1/images/<id>.png

Image URLs are included in model responses — no configuration needed.

Search (WIP)

curl http://localhost:8741/v1/search?q=latest+news

Web search powered by Google grounding. Still work in progress.

Account Management

Add Account

curl -X POST http://localhost:8741/v1/accounts \
  -H "Content-Type: application/json" \
  -d '{"email": "user@gmail.com", "refresh_token": "1//0fXXX"}'

List Accounts

curl http://localhost:8741/v1/accounts

Remove Account

curl -X DELETE http://localhost:8741/v1/accounts \
  -H "Content-Type: application/json" \
  -d '{"email": "user@gmail.com"}'

Set Active Account (Runtime)

Switches the active account immediately without restarting the proxy process manually.

curl -X POST http://localhost:8741/v1/accounts/set_active \
  -H "Content-Type: application/json" \
  -d '{"email": "user@gmail.com"}'

Account Status

curl http://localhost:8741/v1/accounts/status

Returns per-account details including email, active flag, and quota usage breakdown.

Account Rotation

When running with 2+ accounts, the proxy automatically rotates to the next account when:

Google returns RESOURCE_EXHAUSTED (429) — after 3 consecutive failures
Google returns PERMISSION_DENIED (403) — immediate rotation (no consecutive threshold)

The rotation:

Waits a short cooldown (5–10s with jitter)
Refreshes the next account's access token via OAuth
Restarts the backend to get a clean session
Resets cooldown windows while preserving exhaustion counters

Use --quota-cap 0.2 (default) or set ZEROGRAVITY_QUOTA_CAP=0.2 to rotate proactively when any model exceeds 20% usage (i.e., remaining quota drops below 80%). When all accounts are exhausted, the proxy parks and waits for quota to reset. Set to 0 to disable proactive rotation.

Token Management

Set Token at Runtime

curl -X POST http://localhost:8741/v1/token \
  -H "Content-Type: application/json" \
  -d '{"token": "ya29.xxx"}'

Note: Access tokens expire in ~60 minutes. Use refresh tokens via accounts.json or POST /v1/accounts instead.

Monitoring

Usage

curl http://localhost:8741/v1/usage

Returns token counts persisted by the proxy, including stats restored across restarts.

Quota

curl http://localhost:8741/v1/quota

Returns per-model quota limits and current usage from the backend.

Health Check

curl http://localhost:8741/health

Returns 200 OK when the proxy is running.

Raw Replay

curl -X POST http://localhost:8741/v1/replay/raw \
  -H "Content-Type: application/json" \
  --data-binary @modified_request.json

Send a pre-built payload (from a trace's modified_request.json) directly through the MITM tunnel, bypassing all request translation. Used for latency diagnostics.

API Key Protection

Protect the proxy from unauthorized access by setting ZEROGRAVITY_API_KEY:

# Single key
export ZEROGRAVITY_API_KEY="your-secret-key"

# Multiple keys (comma-separated)
export ZEROGRAVITY_API_KEY="key1,key2,key3"

Clients must include the key using any of these header formats:

# OpenAI-style (Authorization: Bearer)
curl http://localhost:8741/v1/chat/completions \
  -H "Authorization: Bearer your-secret-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3-flash", "messages": [{"role": "user", "content": "hi"}]}'

# Anthropic-style (x-api-key)
curl http://localhost:8741/v1/messages \
  -H "x-api-key: your-secret-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "opus-4.6", "max_tokens": 1024, "messages": [{"role": "user", "content": "hi"}]}'

# Gemini-style (x-goog-api-key)
curl http://localhost:8741/v1beta/models/gemini-3-flash:generateContent \
  -H "x-goog-api-key: your-secret-key" \
  -H "Content-Type: application/json" \
  -d '{"contents": [{"role": "user", "parts": [{"text": "hi"}]}]}'

Note: If ZEROGRAVITY_API_KEY is not set, no authentication is enforced (backward-compatible). Public compatibility routes include /health, /, /api/event_logging/batch, /.well-known/{*path}, and /v1/images/{*path}.

Debug / Testing

Simulate Account State

Inject fake runtime state for an account to test rotation selection, quota behavior, and error handling without real Google traffic. Requires ZEROGRAVITY_DEBUG=1 environment variable -- returns 404 when disabled.

# Mark an account as banned
curl -X POST http://localhost:8741/v1/debug/simulate \
  -H "Content-Type: application/json" \
  -d '{"email": "user@gmail.com", "banned": true}'

# Simulate quota exhaustion (triggers rotation on next request)
curl -X POST http://localhost:8741/v1/debug/simulate \
  -H "Content-Type: application/json" \
  -d '{"email": "user@gmail.com", "remaining_fraction": 0.0, "cooldown_secs": 3600}'

# Clear a ban and restore quota
curl -X POST http://localhost:8741/v1/debug/simulate \
  -H "Content-Type: application/json" \
  -d '{"email": "user@gmail.com", "banned": false, "remaining_fraction": 1.0}'

Request body:

Field	Type	Required	Description
`email`	string	Yes	Account email to modify
`banned`	boolean	No	Mark account as banned (skipped by rotation)
`restricted`	boolean	No	Mark account as restricted (skipped by rotation)
`cooldown_secs`	integer	No	Seconds until account is eligible for selection
`remaining_fraction`	float	No	Quota remaining (0.0 = exhausted, 1.0 = full)

Only provided fields are applied -- omitted fields keep their current value. The response returns the account's current state after the update.

Enable debug mode:

# Docker Compose
environment:
  - ZEROGRAVITY_DEBUG=1

# Or export before starting
export ZEROGRAVITY_DEBUG=1

All Endpoints

Method	Path	Description
`POST`	`/v1/chat/completions`	Chat Completions API (OpenAI compat)
`POST`	`/v1/responses`	Responses API (sync + streaming)
`POST`	`/v1/messages`	Messages API (Anthropic compat)
`POST`	`/v1/messages/count_tokens`	Anthropic token counting endpoint
`POST`	`/v1beta/models/{model}:{action}`	Official Gemini v1beta routes
`GET`	`/v1beta/models`	List models (Gemini v1beta format)
`GET`	`/v1beta/models/{model}`	Get model info (Gemini v1beta format)
`GET`	`/v1/models`	List available models
`GET/POST`	`/v1/search`	Web Search via Google grounding (WIP)
`POST`	`/v1/token`	Set OAuth token at runtime
`POST`	`/v1/accounts`	Add account (email + refresh_token)
`POST`	`/v1/accounts/set_active`	Set active account at runtime
`GET`	`/v1/accounts`	List stored accounts
`DELETE`	`/v1/accounts`	Remove account by email
`GET`	`/v1/accounts/status`	Per-account status with quota usage
`GET`	`/v1/usage`	Proxy token usage
`GET`	`/v1/quota`	Quota and rate limits
`GET`	`/v1/images/*`	Serve generated images
`POST`	`/v1/replay/raw`	Send pre-built trace through MITM
`POST`	`/v1/debug/simulate`	Inject account state (debug only)
`GET`	`/health`	Health check
`GET/POST`	`/`	Compatibility root (returns status)
`POST`	`/api/event_logging/batch`	Compatibility event logging endpoint
`GET/POST`	`/.well-known/{*path}`	Compatibility well-known endpoint

Behavior Notes

Default Output Tokens

When a client omits max_tokens (Anthropic), max_completion_tokens (OpenAI), or max_output_tokens (OpenAI Responses), the proxy defaults to 64,000 tokens -- just below Gemini's 65,536 ceiling. The MITM layer enforces a minimum of 4,096 regardless. This means clients that previously got errors for missing max_tokens now receive a sensible default.

Thinking Budget (Claude Aliases)

Claude aliases (opus-4.6, sonnet-4.6) are backed by Gemini models. When clients send budget_tokens via the Anthropic Messages API, the proxy maps it to Gemini thinking levels:

budget_tokens	Gemini thinkingLevel
0	disabled
1 -- 512	minimal
513 -- 1024	low
1025 -- 4096	medium
4097+	high

Raw integer budgets cause 400 INVALID_ARGUMENT on Gemini 3+ models. The proxy handles this automatically.

Tool Calling

OpenAI and Anthropic tool/function declarations are translated to Gemini's format. During translation:

Constraint hints (minLength, maxLength, pattern, format, default, examples) are preserved as description text since Gemini strips these JSON Schema keys
Nullable properties are removed from required arrays
Union types (e.g. ["string", "array"]) produce "Accepts: string | array" description hints
Non-standard keys (strict, x-* prefixed, etc.) are stripped before sending to Gemini

Gemini-native tool declarations (via /v1beta/) pass through with zero translation.

Request Validation

Missing model field returns HTTP 400 (previously had a silent default)
Tool names exceeding 128 characters are rejected with HTTP 400

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Reference

Protocol Endpoints

OpenAI-compatible

Anthropic-compatible

Gemini-compatible

Models

Model Aliases

Images

Search (WIP)

Account Management

Add Account

List Accounts

Remove Account

Set Active Account (Runtime)

Account Status

Account Rotation

Token Management

Set Token at Runtime

Monitoring

Usage

Quota

Health Check

Raw Replay

API Key Protection

Debug / Testing

Simulate Account State

All Endpoints

Behavior Notes

Default Output Tokens

Thinking Budget (Claude Aliases)

Tool Calling

Request Validation

FilesExpand file tree

api.md

Latest commit

History

api.md

File metadata and controls

API Reference

Protocol Endpoints

OpenAI-compatible

Anthropic-compatible

Gemini-compatible

Models

Model Aliases

Images

Search (WIP)

Account Management

Add Account

List Accounts

Remove Account

Set Active Account (Runtime)

Account Status

Account Rotation

Token Management

Set Token at Runtime

Monitoring

Usage

Quota

Health Check

Raw Replay

API Key Protection

Debug / Testing

Simulate Account State

All Endpoints

Behavior Notes

Default Output Tokens

Thinking Budget (Claude Aliases)

Tool Calling

Request Validation