Support v1/responses and use harmony in serving_chat#8837
Conversation
Support tool-call in serving_chat Updates to make serving_chat work
This comment was marked as spam.
This comment was marked as spam.
|
Close for now. Need to add new dependency |
This comment was marked as spam.
This comment was marked as spam.
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
|
||
| req.send_decode_id_offset = len(decode_ids) | ||
| read_offsets.append(read_offset) | ||
| if self.skip_tokenizer_init: |
There was a problem hiding this comment.
harmony need output_ids to parse the content
There was a problem hiding this comment.
is it ok to return output_ids any time? do we need add is_harmony check for protection?
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
| @@ -0,0 +1,370 @@ | |||
| # SPDX-License-Identifier: Apache-2.0 | |||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | |||
There was a problem hiding this comment.
It is better to also put a link to the original file
There was a problem hiding this comment.
Done, see the PR referred below.
| type: Literal["reasoning_text"] = "reasoning_text" | ||
|
|
||
|
|
||
| class ResponseReasoningItem(BaseModel): |
There was a problem hiding this comment.
Why do you define this class, but also import it from openai.types.responses?
There was a problem hiding this comment.
my bad, import this to solve pydantic ERROR but forgot to delete defined one, would have a cleanup PR.
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
|
||
| if isinstance(recv_obj, BatchStrOut): | ||
| state.text += recv_obj.output_strs[i] | ||
| if state.obj.stream: |
There was a problem hiding this comment.
This condition is wrong. It should be if self.server_args.stream_output and state.obj.stream:
Motivation
Support v1/responses API and mcp server.
Use harmony in serving_chat.py
Co-Authored-By: Xinyuan Tong justinning0323@outlook.com
Notes:
In order to utilize the gpt - oss Demo tools, it is necessary to employ an environment with Python version 3.12. Additionally, the
mcpandgpt-osspackages must be installed.Quick demo: link
One of the prominent features of gpt-oss is the capability to directly invoke tools, which are referred to as "built-in tools". In sglang, several options are provided:
By default, integration is established with the browser of the reference library (utilizing ExaBackend) and the demo Python interpreter through a Docker container. To utilize the search backend, access to exa.ai is required, and the
EXA_API_KEYshould be set as an environment variable. Regarding Python, either ensure that Docker is available, or setPYTHON_EXECUTION_BACKEND = UV. However, settingPYTHON_EXECUTION_BACKEND = UVallows the execution of model - generated code snippets on the same machine, which poses a certain risk.The command to launch the server is:
python -m sglang.launch_server ... --tool-server demoIt should be noted that the default options are solely intended for demonstration purposes. For production-level usage, sglang can function as an MCP client for multiple services. An example tool server that sglang can interact with is provided. These servers encapsulate the demo tools, and the commands to run them are as follows:
The URLs are expected to be MCP SSE servers that implement the instructions in the server information and well - documented tools. These tools will be incorporated into the system prompt for the model to enable their utilization.
Modifications
Accuracy Test
Benchmark & Profiling
Checklist