Skip to content

[BUG] Local Thinking model (Qwen 3.5) return empty content when reasoning consumes all tokens #966

@hththt

Description

@hththt

Bug Description

When using Qwen 3.5 (or other thinking/reasoning models) via openai_compat provider, the LLM occasionally returns empty content with all tokens consumed by reasoning_content. This causes the agent to reply with the default error message:

"I've completed processing but have no response to give. Increase max_tool_iterations in config.json."

The message is misleading — the issue is not max_tool_iterations (default 20, only 3 used), but the LLM returning empty content.

Environment

  • Model: Qwen3.5-35B-A3B-IQ4_XS.gguf (thinking model)
  • Backend: llama.cpp (build b8176) via OpenAI-compatible API
  • Config: max_tokens: 132768, temperature: 0.7

Steps to Reproduce

  1. Configure a thinking model (Qwen 3.5, DeepSeek R1, etc.) in model_list
  2. Send a message that triggers tool calls (e.g., "How to install third-party skills?")
  3. The model calls tools (e.g., find_skills) across 2 iterations
  4. On iteration 3, the model enters thinking mode, spends all tokens on reasoning_content, and returns empty content

Log Output

[INFO] agent: Tool call: find_skills({"limit":10,"query":"github integration"})
[INFO] tool: Tool execution completed {tool=find_skills, duration_ms=989, result_length=2351}
[INFO] agent: Tool call: find_skills({"limit":5,"query":"weather"})
[INFO] tool: Tool execution completed {tool=find_skills, duration_ms=665, result_length=1067}
[INFO] agent: LLM response without tool calls (direct answer) {iteration=3, content_chars=0}
[INFO] agent: Response: I've completed processing but have no response to give.

Root Cause Analysis

1. Empty content not handled in agent loop

In pkg/agent/loop.go:782-790, when the LLM returns no tool calls, only response.Content is used:

if len(response.ToolCalls) == 0 {
    finalContent = response.Content  // empty string when reasoning consumed all tokens
    break
}

The response.ReasoningContent (which may contain useful information) is ignored as a fallback. Then at line 530:

if finalContent == "" {
    finalContent = opts.DefaultResponse  // misleading error message
}

2. API Response from thinking model

Thinking models return:

{
  "content": "",
  "reasoning_content": "好的,用户想知道如何安装技能...(long reasoning)...",
  "finish_reason": "length"
}

The finish_reason: "length" confirms reasoning consumed the entire token budget, leaving nothing for actual content.

3. No mechanism to pass model-specific parameters

Thinking models (Qwen 3.5, DeepSeek R1, etc.) support disabling thinking mode via API request body parameters (e.g., chat_template_kwargs: {"enable_thinking": false} for llama.cpp). However, ModelConfig has no way to pass arbitrary parameters to the API request body.

Server-side flags (--chat-template-kwargs) do not work reliably (llama.cpp#13160). Only passing chat_template_kwargs in the request body works — which requires picoclaw support.

4. Important: disabling thinking is NOT a complete fix

Testing shows that disabling thinking mode causes Qwen 3.5 to lose its ability to decide when to stop calling tools — it loops indefinitely making tool calls instead of providing a direct answer. Thinking mode is essential for proper tool-use reasoning with these models.

Proposed Fix

Fix 1: Fallback to ReasoningContent when Content is empty

In pkg/agent/loop.go, when Content is empty but ReasoningContent is available, use it as fallback:

if len(response.ToolCalls) == 0 {
    finalContent = response.Content
    if finalContent == "" && response.ReasoningContent != "" {
        finalContent = response.ReasoningContent
    }
    break
}

Fix 2: Log finish_reason for better debugging

Currently finish_reason is not logged at INFO level, making it impossible to distinguish "model chose to stop" vs "max tokens hit":

logger.InfoCF("agent", "LLM response without tool calls (direct answer)",
    map[string]any{
        "content_chars":     len(finalContent),
        "finish_reason":     response.FinishReason,
        "reasoning_fallback": finalContent != response.Content,
    })

Fix 3: Support extra_body in ModelConfig

Add an extra_body field to ModelConfig to allow passing arbitrary parameters to the API request body:

{
  "model_name": "qwen-3.5",
  "model": "Qwen3.5-35B-A3B-IQ4_XS.gguf",
  "api_base": "http://localhost:9015/v1",
  "extra_body": {
    "chat_template_kwargs": { "enable_thinking": false }
  }
}

Files to modify:

  • pkg/config/config.go — Add ExtraBody map[string]any to ModelConfig
  • pkg/providers/openai_compat/provider.go — Store and merge extraBody into request body
  • pkg/providers/http_provider.go — Thread ExtraBody through to provider
  • pkg/providers/factory_provider.go — Pass cfg.ExtraBody when creating providers

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions