Support v1/responses and use harmony in serving_chat by CatherineSue · Pull Request #8837 · sgl-project/sglang

CatherineSue · 2025-08-06T01:04:17Z

Motivation

Support v1/responses API and mcp server.

Use harmony in serving_chat.py

Co-Authored-By: Xinyuan Tong justinning0323@outlook.com

Notes:

Current version of gpt-oss is not stable, I using the version d8db548 to test, with apply this PR

In order to utilize the gpt - oss Demo tools, it is necessary to employ an environment with Python version 3.12. Additionally, the mcp and gpt-oss packages must be installed.

Quick demo: link

One of the prominent features of gpt-oss is the capability to directly invoke tools, which are referred to as "built-in tools". In sglang, several options are provided:

By default, integration is established with the browser of the reference library (utilizing ExaBackend) and the demo Python interpreter through a Docker container. To utilize the search backend, access to exa.ai is required, and the EXA_API_KEY should be set as an environment variable. Regarding Python, either ensure that Docker is available, or set PYTHON_EXECUTION_BACKEND = UV. However, setting PYTHON_EXECUTION_BACKEND = UV allows the execution of model - generated code snippets on the same machine, which poses a certain risk.

The command to launch the server is: python -m sglang.launch_server ... --tool-server demo

It should be noted that the default options are solely intended for demonstration purposes. For production-level usage, sglang can function as an MCP client for multiple services. An example tool server that sglang can interact with is provided. These servers encapsulate the demo tools, and the commands to run them are as follows:

mcp run -t sse browser_server.py:mcp
mcp run -t sse python_server.py:mcp

python -m sglang.launch_server ... --tool-server ip-1:port-1,ip-2:port-2

The URLs are expected to be MCP SSE servers that implement the instructions in the server information and well - documented tools. These tools will be incorporated into the system prompt for the model to enable their utilization.

Modifications

Turn on harmony with gpt-oss by default
Support harmony request and parsing.
NOTE: tool call in chat completion for this model with harmony is not supported yet.

Accuracy Test

Non stream:

Stream:

Reasoning effort: low vs high

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

Support tool-call in serving_chat Updates to make serving_chat work

CatherineSue · 2025-08-06T01:09:32Z

Close for now. Need to add new dependency

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

yizhang2077 · 2025-08-06T09:07:28Z

python/sglang/srt/managers/scheduler_output_processor_mixin.py


                req.send_decode_id_offset = len(decode_ids)
                read_offsets.append(read_offset)
-                if self.skip_tokenizer_init:


why need remove here?

harmony need output_ids to parse the content

is it ok to return output_ids any time? do we need add is_harmony check for protection?

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

python/sglang/srt/entrypoints/context.py

python/sglang/srt/entrypoints/openai/serving_responses.py

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

merrymercy · 2025-08-08T11:49:51Z

python/sglang/srt/entrypoints/harmony_utils.py

@@ -0,0 +1,370 @@
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright contributors to the vLLM project


It is better to also put a link to the original file

Done, see the PR referred below.

merrymercy · 2025-08-08T11:50:10Z

python/sglang/srt/entrypoints/openai/protocol.py

+    type: Literal["reasoning_text"] = "reasoning_text"
+
+
+class ResponseReasoningItem(BaseModel):


Why do you define this class, but also import it from openai.types.responses?

my bad, import this to solve pydantic ERROR but forgot to delete defined one, would have a cleanup PR.

Removed in #9043

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

merrymercy · 2025-11-22T06:48:46Z

python/sglang/srt/managers/tokenizer_manager.py


            if isinstance(recv_obj, BatchStrOut):
                state.text += recv_obj.output_strs[i]
+                if state.obj.stream:


This condition is wrong. It should be if self.server_args.stream_output and state.obj.stream:

CatherineSue added 2 commits August 5, 2025 23:04

Support serving_chat with harmony

53f2a54

Support tool-call in serving_chat Updates to make serving_chat work

Fix serving_chat streaming with harmony

77205f8

CatherineSue requested review from Ying1123, hnyls2002, ispobock, merrymercy, slin1237 and xiezhq-hermann as code owners August 6, 2025 01:04

This comment was marked as spam.

Sign in to view

CatherineSue closed this Aug 6, 2025

This comment was marked as spam.

Sign in to view

CatherineSue reopened this Aug 6, 2025

Add harmony and gpt-oss dep

5391f72

CatherineSue requested a review from zhyncs as a code owner August 6, 2025 01:14

zhyncs added the high priority label Aug 6, 2025

JustinTong0323 and others added 8 commits August 6, 2025 02:51

support responses

25918cf

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

fix lint

c812bb7

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

Support serving_chat with harmony

4fc3f5e

supporting serving response

9e591b4

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

serving_response workable

407a957

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

fix non-stream

5adfce4

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

Merge branch 'dev-responses' into chang/oss-clean

ac6ff84

Remove debug print

8579dc9

CatherineSue changed the title ~~Support harmony in serving_chat~~ Support v1/responses and use harmony in serving_chat Aug 6, 2025

JustinTong0323 and others added 5 commits August 5, 2025 21:45

upgrade openai-harmony==0.0.3

00628d5

Support tool use

0bfff37

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

remove debug comments

763bc66

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Merge branch 'main' into chang/oss-clean

ada1824

Merge branch 'main' into chang/oss-clean

d10aea5

JustinTong0323 added 3 commits August 6, 2025 01:28

resolve import

295f0a5

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Update openai dependency to version 1.99.1 in pyproject.toml

bcab37a

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

clean up codes

3ded0a4

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

yizhang2077 reviewed Aug 6, 2025

View reviewed changes

zhyncs assigned CatherineSue, yizhang2077, JustinTong0323 and ispobock Aug 6, 2025

default not load mcp

57997b7

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

yizhang2077 reviewed Aug 6, 2025

View reviewed changes

python/sglang/srt/entrypoints/context.py Show resolved Hide resolved

yizhang2077 reviewed Aug 6, 2025

View reviewed changes

python/sglang/srt/entrypoints/openai/serving_responses.py Outdated Show resolved Hide resolved

code cleanup

f99ec4c

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

zhyncs merged commit 92cc32d into main Aug 6, 2025
57 of 67 checks passed

zhyncs deleted the chang/oss-clean branch August 6, 2025 23:20

This was referenced Aug 7, 2025

[Feature] Optimize DeepSeek's DeepEP on Ascend NPU #8355

Merged

[Bug] ImportError: cannot import name 'ActionFind' from 'openai.types.responses.response_function_web_search' #8886

Closed

merrymercy reviewed Aug 8, 2025

View reviewed changes

bfroemel mentioned this pull request Oct 15, 2025

Clarify support for tool calling and OpenAI Responses API #10038

Closed

merrymercy reviewed Nov 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support v1/responses and use harmony in serving_chat#8837

Support v1/responses and use harmony in serving_chat#8837
zhyncs merged 21 commits intomainfrom
chang/oss-clean

CatherineSue commented Aug 6, 2025 •

edited by JustinTong0323

Loading

Uh oh!

This comment was marked as spam.

CatherineSue commented Aug 6, 2025

Uh oh!

This comment was marked as spam.

yizhang2077 Aug 6, 2025

Uh oh!

JustinTong0323 Aug 6, 2025

Uh oh!

yizhang2077 Aug 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

merrymercy Aug 8, 2025

Uh oh!

CatherineSue Aug 11, 2025

Uh oh!

merrymercy Aug 8, 2025

Uh oh!

JustinTong0323 Aug 9, 2025

Uh oh!

CatherineSue Aug 11, 2025

Uh oh!

merrymercy Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		@@ -0,0 +1,370 @@
		# SPDX-License-Identifier: Apache-2.0
		# SPDX-FileCopyrightText: Copyright contributors to the vLLM project

		type: Literal["reasoning_text"] = "reasoning_text"


		class ResponseReasoningItem(BaseModel):

Conversation

CatherineSue commented Aug 6, 2025 • edited by JustinTong0323 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Notes:

Modifications

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

This comment was marked as spam.

CatherineSue commented Aug 6, 2025

Uh oh!

This comment was marked as spam.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yizhang2077 Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

CatherineSue commented Aug 6, 2025 •

edited by JustinTong0323

Loading

yizhang2077 Aug 6, 2025 •

edited

Loading