Agent infinite-loops on unconfigured tts — three stacked issues

### Summary

When a Telegram user asks the agent to use voice / TTS and no TTS provider is configured, the agent enters an infinite tool-call loop. In a real deployment it produced 275 identical assistant messages in ~10 minutes before context overflow forced auto-compaction (which then continued the loop). The root cause is three issues that stack; any one of them being fixed would likely stop the loop.

### Environment

- OpenClaw image: `ghcr.io/openclaw/openclaw` base tag `2026.3.13`
- Agent model: `google/gemini-2.5-flash`
- Channel: Telegram
- No TTS provider configured (default install — `openai`, `minimax`, `vydra` all unconfigured)

### Reproduction

1. Deploy an OpenClaw gateway with the `tts` plugin enabled but **no** TTS provider configured.
2. From Telegram, message the bot: *"I want to chat to you on voice."*
3. Nudge the bot (e.g. send `?`) so it has pressure to respond.
4. Observe: the agent emits the same `<final>…</final>` text + `tts` tool call repeatedly (~20–30 calls per minute), never returning control.

### The three stacked bugs

#### Bug 1 — `tts` plugin returns `isError: false` on hard failure

The `tts` plugin (`extensions/speech-core/src/tts-tool.ts`) returns the provider-unavailable error **inside the content text**, but the tool result ends up flagged as successful — the failure path returns `{ content, details: { error } }` without setting `isError: true`, so the framework defaults it to `false`:

```json
{
  "role": "toolResult",
  "toolName": "tts",
  "isError": false,
  "content": [{
    "type": "text",
    "text": "TTS conversion failed: : no provider registered; openai: not configured; minimax: not configured; vydra: not configured"
  }],
  "details": {
    "error": "TTS conversion failed: : no provider registered; openai: not configured; minimax: not configured; vydra: not configured"
  }
}
```

`details.error` is populated — so the plugin *knows* it failed — but `isError` is `false`. Other tools (e.g. `exec`) correctly return `isError: true` on failure. This makes it hard for the model to recognise the failure, and likely prevents any agent-side retry/backoff logic from triggering.

**Expected:** `isError: true` whenever `details.error` is set (or whenever no provider accepted the request).

#### Bug 2 — `<final>` does not terminate the turn when a `toolCall` is in the same assistant message

Every one of the 275 looping assistant messages has **both** a `<final>…</final>` text block *and* a pending `toolCall`:

```json
{
  "role": "assistant",
  "content": [
    { "type": "text", "text": "<final>You're absolutely right to nudge me for an update…</final>" },
    { "type": "toolCall", "name": "tts", "arguments": { "text": "Hello <user>, can you hear me?" } }
  ]
}
```

The agent loop appears to treat the pending tool call as authoritative and ignores `<final>`. So the turn never ends — even though the model has emitted its "I'm done" signal.

**Expected:** if `<final>` is present, either (a) end the turn and drop the tool call, or (b) reject the message shape at parse time and force the model to choose one. Silent ignore means the model has no way to actually stop.

#### Bug 3 — Loop guard exists but is disabled by default

A tool-loop detection system exists (`src/agents/tool-loop-detection.ts` — 4 detectors including a global circuit breaker), but it is **disabled by default** (`tools.loopDetection.enabled` defaults to `false`). With the default configuration, the agent made **275 consecutive `tts({ text: "Hello <user>, can you hear me?" })` calls**, each returning the same failure, with no circuit-breaker active. The loop only ended because context hit the overflow threshold (983 messages → auto-compaction) — and even then, the loop resumed after compaction.

**Expected:** either (a) enable loop detection by default (the existing thresholds of warn@10 / block@20 / global-breaker@30 seem reasonable as defaults), or (b) auto-enable it when a tool returns `details.error`, so that known-failing tools are guarded even without explicit opt-in.

### Evidence (from a single production session)

- Session file: `agents/main/sessions/<session-id>.jsonl`, ~1118 lines.
- Tool-call distribution in the 10-minute loop window:
  - `tts`: **275**
  - `exec`: 5
  - `process`: 4
  - `read`: 1
- All 275 `tts` calls have identical arguments: `{ "text": "Hello <user>, can you hear me?" }`.
- All 275 tool results are identical: the "no provider registered" text, `isError: false`.
- Looping assistant messages sampled at lines 560, 700, 900, 1080 — structurally identical (same `<final>` text + same tool call).
- Context-overflow log line at 15:25:41:
  ```
  [agent] [context-overflow-diag] sessionKey=agent:main:main
    provider=google/gemini-2.5-flash source=assistantError messages=983
    error=Context overflow: estimated context size exceeds safe threshold during tool loop.
  ```
- Auto-compaction succeeded at 15:27:02 and the loop continued.

### Suggested fix priority

1. **Bug 1** is the smallest change (set `isError: true` in the TTS failure path) and would probably prevent the loop on its own — with a correct error signal, the model and/or agent loop can back off.
2. **Bug 3** is a one-line config change (flip the default to `enabled: true`) that provides defense-in-depth regardless of individual plugin correctness.
3. **Bug 2** is a broader design question about `<final>` semantics but worth clarifying in docs even if the behavior stays as-is.

### Not in scope of this issue

- Whether `tts` should auto-configure a provider at install time.
- Voice input (STT) — this issue is purely about output TTS when unconfigured.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agent infinite-loops on unconfigured tts — three stacked issues #67744

Summary

Environment

Reproduction

The three stacked bugs

Bug 1 — `tts` plugin returns `isError: false` on hard failure

Bug 2 — `<final>` does not terminate the turn when a `toolCall` is in the same assistant message

Bug 3 — Loop guard exists but is disabled by default

Evidence (from a single production session)

Suggested fix priority

Not in scope of this issue

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Agent infinite-loops on unconfigured tts — three stacked issues #67744

Description

Summary

Environment

Reproduction

The three stacked bugs

Bug 1 — tts plugin returns isError: false on hard failure

Bug 2 — <final> does not terminate the turn when a toolCall is in the same assistant message

Bug 3 — Loop guard exists but is disabled by default

Evidence (from a single production session)

Suggested fix priority

Not in scope of this issue

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug 1 — `tts` plugin returns `isError: false` on hard failure

Bug 2 — `<final>` does not terminate the turn when a `toolCall` is in the same assistant message