Problem
When using non-Qwen models (Claude, GPT, Gemini) via the openai auth type with modelProviders config, requests omit max_tokens entirely if samplingParams.max_tokens is not explicitly configured. Many APIs default to a small value (e.g., Anthropic via VertexAI defaults to 4096), causing long responses to be truncated mid-generation.
This frequently breaks tool call parameters — the model generates a WriteFile tool call with a large content parameter, the response gets truncated at 4096 tokens, and the tool call JSON is incomplete.
Root Cause
applyResolvedModelDefaults() in modelsConfig.ts auto-detects contextWindowSize (line 767) and modalities (line 776) from the model name using existing utility functions, but does NOT auto-detect max_tokens. The tokenLimit(model, 'output') function already exists in tokenLimits.ts with correct output limits for all supported model families, but is never called as a fallback.
Expected Behavior
max_tokens should be auto-detected from the model name (using tokenLimit(model, 'output')) when not explicitly set by the provider config, following the same pattern as contextWindowSize and modalities.
Workaround
Set generationConfig.samplingParams.max_tokens explicitly in modelProviders settings for each model.
Problem
When using non-Qwen models (Claude, GPT, Gemini) via the
openaiauth type withmodelProvidersconfig, requests omitmax_tokensentirely ifsamplingParams.max_tokensis not explicitly configured. Many APIs default to a small value (e.g., Anthropic via VertexAI defaults to 4096), causing long responses to be truncated mid-generation.This frequently breaks tool call parameters — the model generates a
WriteFiletool call with a largecontentparameter, the response gets truncated at 4096 tokens, and the tool call JSON is incomplete.Root Cause
applyResolvedModelDefaults()inmodelsConfig.tsauto-detectscontextWindowSize(line 767) andmodalities(line 776) from the model name using existing utility functions, but does NOT auto-detectmax_tokens. ThetokenLimit(model, 'output')function already exists intokenLimits.tswith correct output limits for all supported model families, but is never called as a fallback.Expected Behavior
max_tokensshould be auto-detected from the model name (usingtokenLimit(model, 'output')) when not explicitly set by the provider config, following the same pattern ascontextWindowSizeandmodalities.Workaround
Set
generationConfig.samplingParams.max_tokensexplicitly inmodelProviderssettings for each model.