Bug Description
When using Qwen 3.5 (or other thinking/reasoning models) via openai_compat provider, the LLM occasionally returns empty content with all tokens consumed by reasoning_content. This causes the agent to reply with the default error message:
"I've completed processing but have no response to give. Increase max_tool_iterations in config.json."
The message is misleading — the issue is not max_tool_iterations (default 20, only 3 used), but the LLM returning empty content.
Environment
- Model: Qwen3.5-35B-A3B-IQ4_XS.gguf (thinking model)
- Backend: llama.cpp (build b8176) via OpenAI-compatible API
- Config:
max_tokens: 132768, temperature: 0.7
Steps to Reproduce
- Configure a thinking model (Qwen 3.5, DeepSeek R1, etc.) in
model_list
- Send a message that triggers tool calls (e.g., "How to install third-party skills?")
- The model calls tools (e.g.,
find_skills) across 2 iterations
- On iteration 3, the model enters thinking mode, spends all tokens on
reasoning_content, and returns empty content
Log Output
[INFO] agent: Tool call: find_skills({"limit":10,"query":"github integration"})
[INFO] tool: Tool execution completed {tool=find_skills, duration_ms=989, result_length=2351}
[INFO] agent: Tool call: find_skills({"limit":5,"query":"weather"})
[INFO] tool: Tool execution completed {tool=find_skills, duration_ms=665, result_length=1067}
[INFO] agent: LLM response without tool calls (direct answer) {iteration=3, content_chars=0}
[INFO] agent: Response: I've completed processing but have no response to give.
Root Cause Analysis
1. Empty content not handled in agent loop
In pkg/agent/loop.go:782-790, when the LLM returns no tool calls, only response.Content is used:
if len(response.ToolCalls) == 0 {
finalContent = response.Content // empty string when reasoning consumed all tokens
break
}
The response.ReasoningContent (which may contain useful information) is ignored as a fallback. Then at line 530:
if finalContent == "" {
finalContent = opts.DefaultResponse // misleading error message
}
2. API Response from thinking model
Thinking models return:
{
"content": "",
"reasoning_content": "好的,用户想知道如何安装技能...(long reasoning)...",
"finish_reason": "length"
}
The finish_reason: "length" confirms reasoning consumed the entire token budget, leaving nothing for actual content.
3. No mechanism to pass model-specific parameters
Thinking models (Qwen 3.5, DeepSeek R1, etc.) support disabling thinking mode via API request body parameters (e.g., chat_template_kwargs: {"enable_thinking": false} for llama.cpp). However, ModelConfig has no way to pass arbitrary parameters to the API request body.
Server-side flags (--chat-template-kwargs) do not work reliably (llama.cpp#13160). Only passing chat_template_kwargs in the request body works — which requires picoclaw support.
4. Important: disabling thinking is NOT a complete fix
Testing shows that disabling thinking mode causes Qwen 3.5 to lose its ability to decide when to stop calling tools — it loops indefinitely making tool calls instead of providing a direct answer. Thinking mode is essential for proper tool-use reasoning with these models.
Proposed Fix
Fix 1: Fallback to ReasoningContent when Content is empty
In pkg/agent/loop.go, when Content is empty but ReasoningContent is available, use it as fallback:
if len(response.ToolCalls) == 0 {
finalContent = response.Content
if finalContent == "" && response.ReasoningContent != "" {
finalContent = response.ReasoningContent
}
break
}
Fix 2: Log finish_reason for better debugging
Currently finish_reason is not logged at INFO level, making it impossible to distinguish "model chose to stop" vs "max tokens hit":
logger.InfoCF("agent", "LLM response without tool calls (direct answer)",
map[string]any{
"content_chars": len(finalContent),
"finish_reason": response.FinishReason,
"reasoning_fallback": finalContent != response.Content,
})
Fix 3: Support extra_body in ModelConfig
Add an extra_body field to ModelConfig to allow passing arbitrary parameters to the API request body:
{
"model_name": "qwen-3.5",
"model": "Qwen3.5-35B-A3B-IQ4_XS.gguf",
"api_base": "http://localhost:9015/v1",
"extra_body": {
"chat_template_kwargs": { "enable_thinking": false }
}
}
Files to modify:
pkg/config/config.go — Add ExtraBody map[string]any to ModelConfig
pkg/providers/openai_compat/provider.go — Store and merge extraBody into request body
pkg/providers/http_provider.go — Thread ExtraBody through to provider
pkg/providers/factory_provider.go — Pass cfg.ExtraBody when creating providers
Bug Description
When using Qwen 3.5 (or other thinking/reasoning models) via
openai_compatprovider, the LLM occasionally returns emptycontentwith all tokens consumed byreasoning_content. This causes the agent to reply with the default error message:The message is misleading — the issue is not
max_tool_iterations(default 20, only 3 used), but the LLM returning empty content.Environment
max_tokens: 132768,temperature: 0.7Steps to Reproduce
model_listfind_skills) across 2 iterationsreasoning_content, and returns emptycontentLog Output
Root Cause Analysis
1. Empty content not handled in agent loop
In
pkg/agent/loop.go:782-790, when the LLM returns no tool calls, onlyresponse.Contentis used:The
response.ReasoningContent(which may contain useful information) is ignored as a fallback. Then at line 530:2. API Response from thinking model
Thinking models return:
{ "content": "", "reasoning_content": "好的,用户想知道如何安装技能...(long reasoning)...", "finish_reason": "length" }The
finish_reason: "length"confirms reasoning consumed the entire token budget, leaving nothing for actual content.3. No mechanism to pass model-specific parameters
Thinking models (Qwen 3.5, DeepSeek R1, etc.) support disabling thinking mode via API request body parameters (e.g.,
chat_template_kwargs: {"enable_thinking": false}for llama.cpp). However,ModelConfighas no way to pass arbitrary parameters to the API request body.Server-side flags (
--chat-template-kwargs) do not work reliably (llama.cpp#13160). Only passingchat_template_kwargsin the request body works — which requires picoclaw support.4. Important: disabling thinking is NOT a complete fix
Testing shows that disabling thinking mode causes Qwen 3.5 to lose its ability to decide when to stop calling tools — it loops indefinitely making tool calls instead of providing a direct answer. Thinking mode is essential for proper tool-use reasoning with these models.
Proposed Fix
Fix 1: Fallback to
ReasoningContentwhenContentis emptyIn
pkg/agent/loop.go, whenContentis empty butReasoningContentis available, use it as fallback:Fix 2: Log
finish_reasonfor better debuggingCurrently
finish_reasonis not logged at INFO level, making it impossible to distinguish "model chose to stop" vs "max tokens hit":Fix 3: Support
extra_bodyinModelConfigAdd an
extra_bodyfield toModelConfigto allow passing arbitrary parameters to the API request body:{ "model_name": "qwen-3.5", "model": "Qwen3.5-35B-A3B-IQ4_XS.gguf", "api_base": "http://localhost:9015/v1", "extra_body": { "chat_template_kwargs": { "enable_thinking": false } } }Files to modify:
pkg/config/config.go— AddExtraBody map[string]anytoModelConfigpkg/providers/openai_compat/provider.go— Store and mergeextraBodyinto request bodypkg/providers/http_provider.go— ThreadExtraBodythrough to providerpkg/providers/factory_provider.go— Passcfg.ExtraBodywhen creating providers