fix: auto-detect max_tokens from model when not set by provider by netbrah · Pull Request #2356 · QwenLM/qwen-code

netbrah · 2026-03-13T15:39:45Z

Summary

When modelProviders config does not specify samplingParams.max_tokens, requests to non-Qwen models (Claude, GPT, Gemini, etc.) omit max_tokens entirely. Many APIs default to a small value (e.g., Anthropic via VertexAI defaults to 4096), causing long responses to be truncated mid-generation — often breaking tool call parameters.

Root Cause

applyResolvedModelDefaults() in modelsConfig.ts auto-detects contextWindowSize and modalities from the model name, but does NOT auto-detect max_tokens. The tokenLimit(model, 'output') function already exists and returns correct values, but is never called as a fallback.

Fix

Apply tokenLimit(model, 'output') as a fallback for samplingParams.max_tokens in applyResolvedModelDefaults(), following the same pattern already used for contextWindowSize and modalities.

Output limits from tokenLimits.ts:

Claude Opus 4.6: 128K
Claude Sonnet 4.6 / fallback: 64K
GPT-5.x: 128K
Gemini 3.x: 64K
Qwen 3.5: 64K

Test plan

TypeScript type-check passes (tsc --noEmit)
Verify Claude model requests now include max_tokens: 65536 (or model-appropriate value)
Verify Qwen OAuth models are unaffected (they use a separate resolution path)
Verify explicit samplingParams.max_tokens in modelProviders config still takes priority

Made with Cursor

netbrah · 2026-03-13T15:48:54Z

Fixes #2358

Mingholy

Good catch and thanks for the contribution!
Suggest adding test caes in test file to verify/specify the expected behavior including but not limited to:

max_tokens auto detection and precedence
fallback when tokenLimit returns undefined

tanzhenxin · 2026-03-15T14:19:58Z

If we set max_tokens for every provider & model, instead of relying on the default behavior of providers, I think we should set the fallback DEFAULT_OUTPUT_TOKEN_LIMIT to a bigger value, maybe 16k or 32k? I am not quite sure how it lands, but I guess it is probably better than 8k.

Mingholy · 2026-03-16T02:50:33Z

If we set max_tokens for every provider & model, instead of relying on the default behavior of providers, I think we should set the fallback DEFAULT_OUTPUT_TOKEN_LIMIT to a bigger value, maybe 16k or 32k? I am not quite sure how it lands, but I guess it is probably better than 8k.

output_limit_analysis.json
Based on the output limit data on models.dev seems 16k is an appropriate value for most latest models while 8k is kind of outdated.

When modelProviders config does not specify samplingParams.max_tokens, requests to non-Qwen models (Claude, GPT, Gemini, etc.) omit max_tokens entirely. Many APIs default to a small value (e.g., Anthropic via VertexAI defaults to 4096), causing long responses to be truncated mid-generation — often breaking tool call parameters. Fix: apply tokenLimit(model, 'output') as a fallback in applyResolvedModelDefaults(), following the same pattern already used for contextWindowSize and modalities auto-detection. Output limits from tokenLimits.ts: - Claude Opus 4.6: 128K - Claude Sonnet 4.6 / fallback: 64K - GPT-5.x: 128K - Gemini 3.x: 64K - Qwen 3.5: 64K Made-with: Cursor

…d tests Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> - Fix generationConfigSources to preserve existing source info when auto-detecting max_tokens - Add unit tests for max_tokens fallback logic

netbrah requested review from DennisYu07, DragonnZhang, LaZzyMan, Mingholy, gwinthis, pomelo-nwu and tanzhenxin as code owners March 13, 2026 15:39

github-actions bot mentioned this pull request Mar 14, 2026

📊 AI CLI 工具社区动态日报 2026-03-14 gsscsd/big_model_radar#32

Open

Mingholy requested changes Mar 14, 2026

View reviewed changes

github-actions bot mentioned this pull request Mar 15, 2026

📊 AI CLI 工具社区动态日报 2026-03-15 gsscsd/big_model_radar#37

Open

tanzhenxin assigned Mingholy Mar 15, 2026

github-actions bot mentioned this pull request Mar 16, 2026

📊 AI CLI 工具社区动态日报 2026-03-16 gsscsd/big_model_radar#43

Open

Mingholy mentioned this pull request Mar 16, 2026

Increase DEFAULT_OUTPUT_TOKEN_LIMIT from 8K to 16K #2411

Merged

netbrah and others added 3 commits March 16, 2026 17:21

fix(models): improve max_tokens auto-detection source tracking and ad…

7f09420

…d tests Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> - Fix generationConfigSources to preserve existing source info when auto-detecting max_tokens - Add unit tests for max_tokens fallback logic

fix: lint error

6f67b12

qwen-code-ci-bot force-pushed the fix/auto-detect-max-output-tokens branch from 995811a to 6f67b12 Compare March 16, 2026 09:24

Mingholy self-requested a review March 16, 2026 09:32

Mingholy approved these changes Mar 16, 2026

View reviewed changes

Mingholy merged commit f901616 into QwenLM:main Mar 16, 2026
13 checks passed

Mingholy mentioned this pull request Mar 17, 2026

feat(skills): add bundled /review skill for out-of-the-box code review #2348

Merged

netbrah deleted the fix/auto-detect-max-output-tokens branch March 30, 2026 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: auto-detect max_tokens from model when not set by provider#2356

fix: auto-detect max_tokens from model when not set by provider#2356
Mingholy merged 3 commits intoQwenLM:mainfrom
netbrah:fix/auto-detect-max-output-tokens

netbrah commented Mar 13, 2026

Uh oh!

netbrah commented Mar 13, 2026

Uh oh!

Mingholy left a comment

Uh oh!

tanzhenxin commented Mar 15, 2026

Uh oh!

Mingholy commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

netbrah commented Mar 13, 2026

Summary

Root Cause

Fix

Test plan

Uh oh!

netbrah commented Mar 13, 2026

Uh oh!

Mingholy left a comment

Choose a reason for hiding this comment

Uh oh!

tanzhenxin commented Mar 15, 2026

Uh oh!

Mingholy commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants