Feature Request
Problem
When using thinking models (e.g., qwen3:0.6b, qwen3.5:0.8b) via AxAIOllama, there's no way
to disable the thinking/reasoning phase. These models generate extensive <think>...</think> tokens
before producing the actual response, which significantly increases latency — often making requests
5-10x slower than necessary for simple tasks like classification.
What I've tried
-
thinkingTokenBudget: 'none' in forward() options — fails with:
AxGenerateError: Model qwen3.5:0.8b does not support thinkingTokenBudget
because ax-llm's model registry doesn't recognize custom Ollama model names as thinking-capable.
-
Ollama Modelfile with PARAMETER think false — Ollama doesn't support think as a Modelfile parameter.
-
Ollama's API does support "think": false in the chat request body, and the CLI supports
/set nothink — but ax-llm doesn't pass this parameter through to Ollama.
Proposed Solution
Allow passing think: false (or the existing thinkingTokenBudget: 'none') through to Ollama's
API for custom models. Two possible approaches:
-
Pass thinkingTokenBudget: 'none' for all Ollama models — since Ollama model names are
custom/user-defined, ax-llm can't know which ones support thinking. Instead of checking a
registry, just pass "think": false in the request body when thinkingTokenBudget is 'none'.
-
Add Ollama-specific options — allow a pass-through for Ollama-specific API parameters
(e.g., options: { think: false }).
Environment
- ax-llm: v19.0.13
- Ollama: v0.17.7
- Models tested: qwen3.5:0.8b, qwen3:0.6b
Feature Request
Problem
When using thinking models (e.g.,
qwen3:0.6b,qwen3.5:0.8b) viaAxAIOllama, there's no wayto disable the thinking/reasoning phase. These models generate extensive
<think>...</think>tokensbefore producing the actual response, which significantly increases latency — often making requests
5-10x slower than necessary for simple tasks like classification.
What I've tried
thinkingTokenBudget: 'none'inforward()options — fails with:AxGenerateError: Model qwen3.5:0.8b does not support thinkingTokenBudgetbecause ax-llm's model registry doesn't recognize custom Ollama model names as thinking-capable.
Ollama Modelfile with
PARAMETER think false— Ollama doesn't support think as a Modelfile parameter.Ollama's API does support
"think": falsein the chat request body, and the CLI supports/set nothink— but ax-llm doesn't pass this parameter through to Ollama.Proposed Solution
Allow passing
think: false(or the existingthinkingTokenBudget: 'none') through to Ollama'sAPI for custom models. Two possible approaches:
Pass
thinkingTokenBudget: 'none'for all Ollama models — since Ollama model names arecustom/user-defined, ax-llm can't know which ones support thinking. Instead of checking a
registry, just pass
"think": falsein the request body whenthinkingTokenBudgetis'none'.Add Ollama-specific options — allow a pass-through for Ollama-specific API parameters
(e.g.,
options: { think: false }).Environment