Skip to content

Agent infinite-loops on unconfigured tts — three stacked issues #67744

@sukhdeepjohar

Description

@sukhdeepjohar

Summary

When a Telegram user asks the agent to use voice / TTS and no TTS provider is configured, the agent enters an infinite tool-call loop. In a real deployment it produced 275 identical assistant messages in ~10 minutes before context overflow forced auto-compaction (which then continued the loop). The root cause is three issues that stack; any one of them being fixed would likely stop the loop.

Environment

  • OpenClaw image: ghcr.io/openclaw/openclaw base tag 2026.3.13
  • Agent model: google/gemini-2.5-flash
  • Channel: Telegram
  • No TTS provider configured (default install — openai, minimax, vydra all unconfigured)

Reproduction

  1. Deploy an OpenClaw gateway with the tts plugin enabled but no TTS provider configured.
  2. From Telegram, message the bot: "I want to chat to you on voice."
  3. Nudge the bot (e.g. send ?) so it has pressure to respond.
  4. Observe: the agent emits the same <final>…</final> text + tts tool call repeatedly (~20–30 calls per minute), never returning control.

The three stacked bugs

Bug 1 — tts plugin returns isError: false on hard failure

The tts plugin (extensions/speech-core/src/tts-tool.ts) returns the provider-unavailable error inside the content text, but the tool result ends up flagged as successful — the failure path returns { content, details: { error } } without setting isError: true, so the framework defaults it to false:

{
  "role": "toolResult",
  "toolName": "tts",
  "isError": false,
  "content": [{
    "type": "text",
    "text": "TTS conversion failed: : no provider registered; openai: not configured; minimax: not configured; vydra: not configured"
  }],
  "details": {
    "error": "TTS conversion failed: : no provider registered; openai: not configured; minimax: not configured; vydra: not configured"
  }
}

details.error is populated — so the plugin knows it failed — but isError is false. Other tools (e.g. exec) correctly return isError: true on failure. This makes it hard for the model to recognise the failure, and likely prevents any agent-side retry/backoff logic from triggering.

Expected: isError: true whenever details.error is set (or whenever no provider accepted the request).

Bug 2 — <final> does not terminate the turn when a toolCall is in the same assistant message

Every one of the 275 looping assistant messages has both a <final>…</final> text block and a pending toolCall:

{
  "role": "assistant",
  "content": [
    { "type": "text", "text": "<final>You're absolutely right to nudge me for an update…</final>" },
    { "type": "toolCall", "name": "tts", "arguments": { "text": "Hello <user>, can you hear me?" } }
  ]
}

The agent loop appears to treat the pending tool call as authoritative and ignores <final>. So the turn never ends — even though the model has emitted its "I'm done" signal.

Expected: if <final> is present, either (a) end the turn and drop the tool call, or (b) reject the message shape at parse time and force the model to choose one. Silent ignore means the model has no way to actually stop.

Bug 3 — Loop guard exists but is disabled by default

A tool-loop detection system exists (src/agents/tool-loop-detection.ts — 4 detectors including a global circuit breaker), but it is disabled by default (tools.loopDetection.enabled defaults to false). With the default configuration, the agent made 275 consecutive tts({ text: "Hello <user>, can you hear me?" }) calls, each returning the same failure, with no circuit-breaker active. The loop only ended because context hit the overflow threshold (983 messages → auto-compaction) — and even then, the loop resumed after compaction.

Expected: either (a) enable loop detection by default (the existing thresholds of warn@10 / block@20 / global-breaker@30 seem reasonable as defaults), or (b) auto-enable it when a tool returns details.error, so that known-failing tools are guarded even without explicit opt-in.

Evidence (from a single production session)

  • Session file: agents/main/sessions/<session-id>.jsonl, ~1118 lines.
  • Tool-call distribution in the 10-minute loop window:
    • tts: 275
    • exec: 5
    • process: 4
    • read: 1
  • All 275 tts calls have identical arguments: { "text": "Hello <user>, can you hear me?" }.
  • All 275 tool results are identical: the "no provider registered" text, isError: false.
  • Looping assistant messages sampled at lines 560, 700, 900, 1080 — structurally identical (same <final> text + same tool call).
  • Context-overflow log line at 15:25:41:
    [agent] [context-overflow-diag] sessionKey=agent:main:main
      provider=google/gemini-2.5-flash source=assistantError messages=983
      error=Context overflow: estimated context size exceeds safe threshold during tool loop.
    
  • Auto-compaction succeeded at 15:27:02 and the loop continued.

Suggested fix priority

  1. Bug 1 is the smallest change (set isError: true in the TTS failure path) and would probably prevent the loop on its own — with a correct error signal, the model and/or agent loop can back off.
  2. Bug 3 is a one-line config change (flip the default to enabled: true) that provides defense-in-depth regardless of individual plugin correctness.
  3. Bug 2 is a broader design question about <final> semantics but worth clarifying in docs even if the behavior stays as-is.

Not in scope of this issue

  • Whether tts should auto-configure a provider at install time.
  • Voice input (STT) — this issue is purely about output TTS when unconfigured.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions