Skip to content

Feature: Support think parameter for Ollama thinking models (Qwen 3/3.5, etc.) #495

@kishorgandham

Description

@kishorgandham

Feature Request

Problem

When using thinking models (e.g., qwen3:0.6b, qwen3.5:0.8b) via AxAIOllama, there's no way
to disable the thinking/reasoning phase. These models generate extensive <think>...</think> tokens
before producing the actual response, which significantly increases latency — often making requests
5-10x slower than necessary for simple tasks like classification.

What I've tried

  1. thinkingTokenBudget: 'none' in forward() options — fails with:
    AxGenerateError: Model qwen3.5:0.8b does not support thinkingTokenBudget
    because ax-llm's model registry doesn't recognize custom Ollama model names as thinking-capable.

  2. Ollama Modelfile with PARAMETER think false — Ollama doesn't support think as a Modelfile parameter.

  3. Ollama's API does support "think": false in the chat request body, and the CLI supports
    /set nothink — but ax-llm doesn't pass this parameter through to Ollama.

Proposed Solution

Allow passing think: false (or the existing thinkingTokenBudget: 'none') through to Ollama's
API for custom models. Two possible approaches:

  1. Pass thinkingTokenBudget: 'none' for all Ollama models — since Ollama model names are
    custom/user-defined, ax-llm can't know which ones support thinking. Instead of checking a
    registry, just pass "think": false in the request body when thinkingTokenBudget is 'none'.

  2. Add Ollama-specific options — allow a pass-through for Ollama-specific API parameters
    (e.g., options: { think: false }).

Environment

  • ax-llm: v19.0.13
  • Ollama: v0.17.7
  • Models tested: qwen3.5:0.8b, qwen3:0.6b

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions