[BUG] Local Thinking model (Qwen 3.5) return empty content when reasoning consumes all tokens

## Bug Description

When using Qwen 3.5 (or other thinking/reasoning models) via `openai_compat` provider, the LLM occasionally returns empty `content` with all tokens consumed by `reasoning_content`. This causes the agent to reply with the default error message:

> "I've completed processing but have no response to give. Increase `max_tool_iterations` in config.json."

The message is misleading — the issue is not `max_tool_iterations` (default 20, only 3 used), but the LLM returning empty content.

## Environment

- **Model**: Qwen3.5-35B-A3B-IQ4_XS.gguf (thinking model)
- **Backend**: llama.cpp (build b8176) via OpenAI-compatible API
- **Config**: `max_tokens: 132768`, `temperature: 0.7`

## Steps to Reproduce

1. Configure a thinking model (Qwen 3.5, DeepSeek R1, etc.) in `model_list`
2. Send a message that triggers tool calls (e.g., "How to install third-party skills?")
3. The model calls tools (e.g., `find_skills`) across 2 iterations
4. On iteration 3, the model enters thinking mode, spends all tokens on `reasoning_content`, and returns empty `content`

## Log Output

```
[INFO] agent: Tool call: find_skills({"limit":10,"query":"github integration"})
[INFO] tool: Tool execution completed {tool=find_skills, duration_ms=989, result_length=2351}
[INFO] agent: Tool call: find_skills({"limit":5,"query":"weather"})
[INFO] tool: Tool execution completed {tool=find_skills, duration_ms=665, result_length=1067}
[INFO] agent: LLM response without tool calls (direct answer) {iteration=3, content_chars=0}
[INFO] agent: Response: I've completed processing but have no response to give.
```

## Root Cause Analysis

### 1. Empty content not handled in agent loop

In `pkg/agent/loop.go:782-790`, when the LLM returns no tool calls, only `response.Content` is used:

```go
if len(response.ToolCalls) == 0 {
    finalContent = response.Content  // empty string when reasoning consumed all tokens
    break
}
```

The `response.ReasoningContent` (which may contain useful information) is ignored as a fallback. Then at line 530:

```go
if finalContent == "" {
    finalContent = opts.DefaultResponse  // misleading error message
}
```

### 2. API Response from thinking model

Thinking models return:
```json
{
  "content": "",
  "reasoning_content": "好的，用户想知道如何安装技能...(long reasoning)...",
  "finish_reason": "length"
}
```

The `finish_reason: "length"` confirms reasoning consumed the entire token budget, leaving nothing for actual content.

### 3. No mechanism to pass model-specific parameters

Thinking models (Qwen 3.5, DeepSeek R1, etc.) support disabling thinking mode via API request body parameters (e.g., `chat_template_kwargs: {"enable_thinking": false}` for llama.cpp). However, `ModelConfig` has no way to pass arbitrary parameters to the API request body.

Server-side flags (`--chat-template-kwargs`) do not work reliably ([llama.cpp#13160](https://github.com/ggml-org/llama.cpp/issues/13160)). Only passing `chat_template_kwargs` in the **request body** works — which requires picoclaw support.

### 4. Important: disabling thinking is NOT a complete fix

Testing shows that disabling thinking mode causes Qwen 3.5 to **lose its ability to decide when to stop calling tools** — it loops indefinitely making tool calls instead of providing a direct answer. Thinking mode is essential for proper tool-use reasoning with these models.

## Proposed Fix

### Fix 1: Fallback to `ReasoningContent` when `Content` is empty

In `pkg/agent/loop.go`, when `Content` is empty but `ReasoningContent` is available, use it as fallback:

```go
if len(response.ToolCalls) == 0 {
    finalContent = response.Content
    if finalContent == "" && response.ReasoningContent != "" {
        finalContent = response.ReasoningContent
    }
    break
}
```

### Fix 2: Log `finish_reason` for better debugging

Currently `finish_reason` is not logged at INFO level, making it impossible to distinguish "model chose to stop" vs "max tokens hit":

```go
logger.InfoCF("agent", "LLM response without tool calls (direct answer)",
    map[string]any{
        "content_chars":     len(finalContent),
        "finish_reason":     response.FinishReason,
        "reasoning_fallback": finalContent != response.Content,
    })
```

### Fix 3: Support `extra_body` in `ModelConfig`

Add an `extra_body` field to `ModelConfig` to allow passing arbitrary parameters to the API request body:

```json
{
  "model_name": "qwen-3.5",
  "model": "Qwen3.5-35B-A3B-IQ4_XS.gguf",
  "api_base": "http://localhost:9015/v1",
  "extra_body": {
    "chat_template_kwargs": { "enable_thinking": false }
  }
}
```

**Files to modify:**
- `pkg/config/config.go` — Add `ExtraBody map[string]any` to `ModelConfig`
- `pkg/providers/openai_compat/provider.go` — Store and merge `extraBody` into request body
- `pkg/providers/http_provider.go` — Thread `ExtraBody` through to provider
- `pkg/providers/factory_provider.go` — Pass `cfg.ExtraBody` when creating providers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Local Thinking model (Qwen 3.5) return empty content when reasoning consumes all tokens #966

Bug Description

Environment

Steps to Reproduce

Log Output

Root Cause Analysis

1. Empty content not handled in agent loop

2. API Response from thinking model

3. No mechanism to pass model-specific parameters

4. Important: disabling thinking is NOT a complete fix

Proposed Fix

Fix 1: Fallback to `ReasoningContent` when `Content` is empty

Fix 2: Log `finish_reason` for better debugging

Fix 3: Support `extra_body` in `ModelConfig`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Local Thinking model (Qwen 3.5) return empty content when reasoning consumes all tokens #966

Description

Bug Description

Environment

Steps to Reproduce

Log Output

Root Cause Analysis

1. Empty content not handled in agent loop

2. API Response from thinking model

3. No mechanism to pass model-specific parameters

4. Important: disabling thinking is NOT a complete fix

Proposed Fix

Fix 1: Fallback to ReasoningContent when Content is empty

Fix 2: Log finish_reason for better debugging

Fix 3: Support extra_body in ModelConfig

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Fix 1: Fallback to `ReasoningContent` when `Content` is empty

Fix 2: Log `finish_reason` for better debugging

Fix 3: Support `extra_body` in `ModelConfig`