Skip to content

Support v1/responses and use harmony in serving_chat#8837

Merged
zhyncs merged 21 commits intomainfrom
chang/oss-clean
Aug 6, 2025
Merged

Support v1/responses and use harmony in serving_chat#8837
zhyncs merged 21 commits intomainfrom
chang/oss-clean

Conversation

@CatherineSue
Copy link
Collaborator

@CatherineSue CatherineSue commented Aug 6, 2025

Motivation

Support v1/responses API and mcp server.

Use harmony in serving_chat.py

Co-Authored-By: Xinyuan Tong justinning0323@outlook.com

Notes:

Current version of gpt-oss is not stable, I using the version d8db548 to test, with apply this PR

In order to utilize the gpt - oss Demo tools, it is necessary to employ an environment with Python version 3.12. Additionally, the mcp and gpt-oss packages must be installed.

Quick demo: link

One of the prominent features of gpt-oss is the capability to directly invoke tools, which are referred to as "built-in tools". In sglang, several options are provided:

By default, integration is established with the browser of the reference library (utilizing ExaBackend) and the demo Python interpreter through a Docker container. To utilize the search backend, access to exa.ai is required, and the EXA_API_KEY should be set as an environment variable. Regarding Python, either ensure that Docker is available, or set PYTHON_EXECUTION_BACKEND = UV. However, setting PYTHON_EXECUTION_BACKEND = UV allows the execution of model - generated code snippets on the same machine, which poses a certain risk.

The command to launch the server is: python -m sglang.launch_server ... --tool-server demo

It should be noted that the default options are solely intended for demonstration purposes. For production-level usage, sglang can function as an MCP client for multiple services. An example tool server that sglang can interact with is provided. These servers encapsulate the demo tools, and the commands to run them are as follows:

mcp run -t sse browser_server.py:mcp
mcp run -t sse python_server.py:mcp

python -m sglang.launch_server ... --tool-server ip-1:port-1,ip-2:port-2

The URLs are expected to be MCP SSE servers that implement the instructions in the server information and well - documented tools. These tools will be incorporated into the system prompt for the model to enable their utilization.

Modifications

  • Turn on harmony with gpt-oss by default
  • Support harmony request and parsing.
  • NOTE: tool call in chat completion for this model with harmony is not supported yet.

Accuracy Test

  • Non stream:
Screenshot 2025-08-05 at 4 48 43 PM
  • Stream:
Screenshot 2025-08-05 at 5 46 46 PM
  • Reasoning effort: low vs high
Screenshot 2025-08-05 at 6 06 55 PM

Benchmark & Profiling

Checklist

Support tool-call in serving_chat

Updates to make serving_chat work
@gemini-code-assist

This comment was marked as spam.

@CatherineSue
Copy link
Collaborator Author

Close for now. Need to add new dependency

@gemini-code-assist

This comment was marked as spam.

@CatherineSue CatherineSue reopened this Aug 6, 2025
JustinTong0323 and others added 8 commits August 6, 2025 02:51
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
@CatherineSue CatherineSue changed the title Support harmony in serving_chat Support v1/responses and use harmony in serving_chat Aug 6, 2025
JustinTong0323 and others added 5 commits August 5, 2025 21:45
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

req.send_decode_id_offset = len(decode_ids)
read_offsets.append(read_offset)
if self.skip_tokenizer_init:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need remove here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

harmony need output_ids to parse the content

Copy link
Collaborator

@yizhang2077 yizhang2077 Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok to return output_ids any time? do we need add is_harmony check for protection?

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@zhyncs zhyncs merged commit 92cc32d into main Aug 6, 2025
57 of 67 checks passed
@zhyncs zhyncs deleted the chang/oss-clean branch August 6, 2025 23:20
@@ -0,0 +1,370 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to also put a link to the original file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, see the PR referred below.

type: Literal["reasoning_text"] = "reasoning_text"


class ResponseReasoningItem(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you define this class, but also import it from openai.types.responses?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, import this to solve pydantic ERROR but forgot to delete defined one, would have a cleanup PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in #9043

narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
yanbing-j pushed a commit to jianan-gu/sglang that referenced this pull request Oct 13, 2025
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

if isinstance(recv_obj, BatchStrOut):
state.text += recv_obj.output_strs[i]
if state.obj.stream:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is wrong. It should be if self.server_args.stream_output and state.obj.stream:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants