Skip to content

fix: auto-detect max_tokens from model when not set by provider#2356

Merged
Mingholy merged 3 commits intoQwenLM:mainfrom
netbrah:fix/auto-detect-max-output-tokens
Mar 16, 2026
Merged

fix: auto-detect max_tokens from model when not set by provider#2356
Mingholy merged 3 commits intoQwenLM:mainfrom
netbrah:fix/auto-detect-max-output-tokens

Conversation

@netbrah
Copy link
Copy Markdown
Contributor

@netbrah netbrah commented Mar 13, 2026

Summary

When modelProviders config does not specify samplingParams.max_tokens, requests to non-Qwen models (Claude, GPT, Gemini, etc.) omit max_tokens entirely. Many APIs default to a small value (e.g., Anthropic via VertexAI defaults to 4096), causing long responses to be truncated mid-generation — often breaking tool call parameters.

Root Cause

applyResolvedModelDefaults() in modelsConfig.ts auto-detects contextWindowSize and modalities from the model name, but does NOT auto-detect max_tokens. The tokenLimit(model, 'output') function already exists and returns correct values, but is never called as a fallback.

Fix

Apply tokenLimit(model, 'output') as a fallback for samplingParams.max_tokens in applyResolvedModelDefaults(), following the same pattern already used for contextWindowSize and modalities.

Output limits from tokenLimits.ts:

  • Claude Opus 4.6: 128K
  • Claude Sonnet 4.6 / fallback: 64K
  • GPT-5.x: 128K
  • Gemini 3.x: 64K
  • Qwen 3.5: 64K

Test plan

  • TypeScript type-check passes (tsc --noEmit)
  • Verify Claude model requests now include max_tokens: 65536 (or model-appropriate value)
  • Verify Qwen OAuth models are unaffected (they use a separate resolution path)
  • Verify explicit samplingParams.max_tokens in modelProviders config still takes priority

Made with Cursor

@netbrah
Copy link
Copy Markdown
Contributor Author

netbrah commented Mar 13, 2026

Fixes #2358

Copy link
Copy Markdown
Collaborator

@Mingholy Mingholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and thanks for the contribution!
Suggest adding test caes in test file to verify/specify the expected behavior including but not limited to:

  1. max_tokens auto detection and precedence
  2. fallback when tokenLimit returns undefined

@tanzhenxin
Copy link
Copy Markdown
Collaborator

If we set max_tokens for every provider & model, instead of relying on the default behavior of providers, I think we should set the fallback DEFAULT_OUTPUT_TOKEN_LIMIT to a bigger value, maybe 16k or 32k? I am not quite sure how it lands, but I guess it is probably better than 8k.

@Mingholy
Copy link
Copy Markdown
Collaborator

If we set max_tokens for every provider & model, instead of relying on the default behavior of providers, I think we should set the fallback DEFAULT_OUTPUT_TOKEN_LIMIT to a bigger value, maybe 16k or 32k? I am not quite sure how it lands, but I guess it is probably better than 8k.

output_limit_analysis.json
Based on the output limit data on models.dev seems 16k is an appropriate value for most latest models while 8k is kind of outdated.

netbrah and others added 3 commits March 16, 2026 17:21
When modelProviders config does not specify samplingParams.max_tokens,
requests to non-Qwen models (Claude, GPT, Gemini, etc.) omit max_tokens
entirely. Many APIs default to a small value (e.g., Anthropic via
VertexAI defaults to 4096), causing long responses to be truncated
mid-generation — often breaking tool call parameters.

Fix: apply tokenLimit(model, 'output') as a fallback in
applyResolvedModelDefaults(), following the same pattern already used
for contextWindowSize and modalities auto-detection.

Output limits from tokenLimits.ts:
  - Claude Opus 4.6: 128K
  - Claude Sonnet 4.6 / fallback: 64K
  - GPT-5.x: 128K
  - Gemini 3.x: 64K
  - Qwen 3.5: 64K

Made-with: Cursor
…d tests

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

- Fix generationConfigSources to preserve existing source info when auto-detecting max_tokens

- Add unit tests for max_tokens fallback logic
@qwen-code-ci-bot qwen-code-ci-bot force-pushed the fix/auto-detect-max-output-tokens branch from 995811a to 6f67b12 Compare March 16, 2026 09:24
@Mingholy Mingholy self-requested a review March 16, 2026 09:32
@Mingholy Mingholy merged commit f901616 into QwenLM:main Mar 16, 2026
13 checks passed
@netbrah netbrah deleted the fix/auto-detect-max-output-tokens branch March 30, 2026 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants