fix: auto-detect max_tokens from model when not set by provider#2356
fix: auto-detect max_tokens from model when not set by provider#2356Mingholy merged 3 commits intoQwenLM:mainfrom
Conversation
|
Fixes #2358 |
Mingholy
left a comment
There was a problem hiding this comment.
Good catch and thanks for the contribution!
Suggest adding test caes in test file to verify/specify the expected behavior including but not limited to:
max_tokensauto detection and precedence- fallback when
tokenLimitreturns undefined
|
If we set max_tokens for every provider & model, instead of relying on the default behavior of providers, I think we should set the fallback |
output_limit_analysis.json |
When modelProviders config does not specify samplingParams.max_tokens, requests to non-Qwen models (Claude, GPT, Gemini, etc.) omit max_tokens entirely. Many APIs default to a small value (e.g., Anthropic via VertexAI defaults to 4096), causing long responses to be truncated mid-generation — often breaking tool call parameters. Fix: apply tokenLimit(model, 'output') as a fallback in applyResolvedModelDefaults(), following the same pattern already used for contextWindowSize and modalities auto-detection. Output limits from tokenLimits.ts: - Claude Opus 4.6: 128K - Claude Sonnet 4.6 / fallback: 64K - GPT-5.x: 128K - Gemini 3.x: 64K - Qwen 3.5: 64K Made-with: Cursor
…d tests Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> - Fix generationConfigSources to preserve existing source info when auto-detecting max_tokens - Add unit tests for max_tokens fallback logic
995811a to
6f67b12
Compare
Summary
When
modelProvidersconfig does not specifysamplingParams.max_tokens, requests to non-Qwen models (Claude, GPT, Gemini, etc.) omitmax_tokensentirely. Many APIs default to a small value (e.g., Anthropic via VertexAI defaults to 4096), causing long responses to be truncated mid-generation — often breaking tool call parameters.Root Cause
applyResolvedModelDefaults()inmodelsConfig.tsauto-detectscontextWindowSizeandmodalitiesfrom the model name, but does NOT auto-detectmax_tokens. ThetokenLimit(model, 'output')function already exists and returns correct values, but is never called as a fallback.Fix
Apply
tokenLimit(model, 'output')as a fallback forsamplingParams.max_tokensinapplyResolvedModelDefaults(), following the same pattern already used forcontextWindowSizeandmodalities.Output limits from
tokenLimits.ts:Test plan
tsc --noEmit)max_tokens: 65536(or model-appropriate value)samplingParams.max_tokensin modelProviders config still takes priorityMade with Cursor