fix: max_tokens not auto-detected for non-Qwen models, causing response truncation

## Problem

When using non-Qwen models (Claude, GPT, Gemini) via the `openai` auth type with `modelProviders` config, requests omit `max_tokens` entirely if `samplingParams.max_tokens` is not explicitly configured. Many APIs default to a small value (e.g., Anthropic via VertexAI defaults to 4096), causing long responses to be truncated mid-generation.

This frequently breaks tool call parameters — the model generates a `WriteFile` tool call with a large `content` parameter, the response gets truncated at 4096 tokens, and the tool call JSON is incomplete.

## Root Cause

`applyResolvedModelDefaults()` in `modelsConfig.ts` auto-detects `contextWindowSize` (line 767) and `modalities` (line 776) from the model name using existing utility functions, but does NOT auto-detect `max_tokens`. The `tokenLimit(model, 'output')` function already exists in `tokenLimits.ts` with correct output limits for all supported model families, but is never called as a fallback.

## Expected Behavior

`max_tokens` should be auto-detected from the model name (using `tokenLimit(model, 'output')`) when not explicitly set by the provider config, following the same pattern as `contextWindowSize` and `modalities`.

## Workaround

Set `generationConfig.samplingParams.max_tokens` explicitly in `modelProviders` settings for each model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: max_tokens not auto-detected for non-Qwen models, causing response truncation #2358

Problem

Root Cause

Expected Behavior

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fix: max_tokens not auto-detected for non-Qwen models, causing response truncation #2358

Description

Problem

Root Cause

Expected Behavior

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions