-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Tool Call] Steamline function arguments when tool_choice="auto" for deepseekv31_detector #11589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
just curios, in which scenarios we need to show the partial tool call item (argument) to user? |
@JustinTong0323 Thanks for your question! I think the reason is similar to why we use streaming output in general — it’s better for users to see a slimmer, more incremental output when printing for debugging or other purposes. More details can be found in the Motivation section. SGLang already supports streaming output for:
So, why not the tool call part when I’ve tested this PR, and it works as expected! And it doesn't introduce any overhead to the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this clever change!
Could you include a screenshot of a test case and the streaming results of before vs. after?
| tool_call_end_pattern, current_text, re.DOTALL | ||
| ) | ||
| if match: | ||
| if is_tool_end: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still inside of the if _is_complete_json() check. I imagine this would usually be the case where the function arguments is complete. Is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Actually I don't change the main logic. I just relaxed the re match condition, so that function args could be returned once part of it is generated
Client: def test_stream_tool_call(base_url, api_key):
client = OpenAI(base_url=base_url + "/v1", api_key=api_key)
model = list(client.models.list())[0].id
print(f"Using {model=}\n\n")
response_stream = client.chat.completions.create(
model=model,
messages=[
{"role": "user", "content": "What is the weather like in Boston in MA today? Please use Fahrenheit."}
],
tools=tool_search_weather,
stream=True,
extra_body={"chat_template_kwargs": {"thinking": True}},
# tool_choice="required",
# stream_options={
# "include_usage": True
# }
)
for chunk in response_stream:
if chunk.choices[0].delta.reasoning_content:
print(chunk.choices[0].delta.reasoning_content, end="", flush=True)
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
if chunk.choices[0].delta.tool_calls:
print(chunk.choices[0].delta.tool_calls[0])
print()tool's definition is here To get the complete streaming output, it could be seen in the SGLang doc here, -- |
|
Is there any progress? |
|
Is there any progress? expecting |
|
@JustinTong0323 @CatherineSue Could you help review this PR again, thanks |
|
@Muqi1029 It seems the test isn’t working as expected. Could you take a look? |



Motivation
The current
deepseekv31_detectoronly detects complete function arguments and immediately returns them while clearing the buffer.This approach can lead to a poor user experience: when function arguments are very long, there can be a noticeable delay before the arguments are returned.
Modifications
Relax the previous strict regular expression by using the tool call end token as a conditional.
Now the pattern matches either the tool call end token or the end of the string as a separate group.
The argument group is matched non-greedily, which prevents excessive consumption of text.
Accuracy Tests
Benchmarking and Profiling
Checklist