Skip to content

Conversation

@Muqi1029
Copy link
Contributor

Motivation

The current deepseekv31_detector only detects complete function arguments and immediately returns them while clearing the buffer.

This approach can lead to a poor user experience: when function arguments are very long, there can be a noticeable delay before the arguments are returned.

Modifications

Relax the previous strict regular expression by using the tool call end token as a conditional.

Now the pattern matches either the tool call end token or the end of the string as a separate group.

The argument group is matched non-greedily, which prevents excessive consumption of text.

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@JustinTong0323
Copy link
Collaborator

just curios, in which scenarios we need to show the partial tool call item (argument) to user?

@Muqi1029
Copy link
Contributor Author

Muqi1029 commented Oct 16, 2025

just curios, in which scenarios we need to show the partial tool call item (argument) to user?

@JustinTong0323 Thanks for your question!

I think the reason is similar to why we use streaming output in general — it’s better for users to see a slimmer, more incremental output when printing for debugging or other purposes. More details can be found in the Motivation section.

SGLang already supports streaming output for:

  1. The reasoning part
  2. The normal content part
  3. Tool call part when tool_choice is True or a special function

So, why not the tool call part when tool_choice is auto as well?

I’ve tested this PR, and it works as expected! And it doesn't introduce any overhead to the code.

Copy link
Collaborator

@CatherineSue CatherineSue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this clever change!

Could you include a screenshot of a test case and the streaming results of before vs. after?

tool_call_end_pattern, current_text, re.DOTALL
)
if match:
if is_tool_end:
Copy link
Collaborator

@CatherineSue CatherineSue Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still inside of the if _is_complete_json() check. I imagine this would usually be the case where the function arguments is complete. Is my understanding correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Actually I don't change the main logic. I just relaxed the re match condition, so that function args could be returned once part of it is generated

@Muqi1029
Copy link
Contributor Author

Muqi1029 commented Oct 16, 2025

Thank you for this clever change!

Could you include a screenshot of a test case and the streaming results of before vs. after?

@CatherineSue

Client:

def test_stream_tool_call(base_url, api_key):
    client = OpenAI(base_url=base_url + "/v1", api_key=api_key)
    model = list(client.models.list())[0].id
    print(f"Using {model=}\n\n")
    response_stream = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "user", "content": "What is the weather like in Boston in MA today? Please use Fahrenheit."}
        ],
        tools=tool_search_weather,
        stream=True,
        extra_body={"chat_template_kwargs": {"thinking": True}},
        # tool_choice="required",
        # stream_options={
        #     "include_usage": True
        # }

    )
    for chunk in response_stream:
        if chunk.choices[0].delta.reasoning_content:
            print(chunk.choices[0].delta.reasoning_content, end="", flush=True)
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
        if chunk.choices[0].delta.tool_calls:
            print(chunk.choices[0].delta.tool_calls[0])
    print()

tool's definition is here

To get the complete streaming output, it could be seen in the SGLang doc here,
image

--

Before:
image

After:
image

@d6638219
Copy link

d6638219 commented Nov 6, 2025

Is there any progress?

@jxz542189
Copy link

jxz542189 commented Nov 9, 2025

Is there any progress? expecting

@Muqi1029
Copy link
Contributor Author

Muqi1029 commented Nov 9, 2025

@JustinTong0323 @CatherineSue Could you help review this PR again, thanks

@cynial
Copy link

cynial commented Nov 13, 2025

@Muqi1029 It seems the test isn’t working as expected. Could you take a look?

@Muqi1029
Copy link
Contributor Author

@Muqi1029 It seems the test isn’t working as expected. Could you take a look?

@cynial sorry, which test do you mean? The CI? I have looked the failed checks, they are not related with this pr.

@cynial
Copy link

cynial commented Nov 14, 2025

@cynial sorry, which test do you mean? The CI? I have looked the failed checks, they are not related with this pr.

@Muqi1029 Got it - I will try to deploy this patch to production. Thank you for your contribution and effort.

@Fridge003 Fridge003 merged commit fc5da1e into sgl-project:main Nov 14, 2025
117 of 124 checks passed
@Muqi1029 Muqi1029 deleted the stream branch December 9, 2025 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants