[router] Function call support for openai router Responses API by key4ng · Pull Request #12386 · sgl-project/sglang

key4ng · 2025-10-30T07:22:41Z

Motivation

Implement proper function calling support in the sgl-router Responses API, including state persistence and conversation management for function calls and their outputs.

Modifications

Renamed type: function_tool_call → function_call throughout codebase for consistency
- Updated item type in protocols, handlers, and ID generation (sgl-router/src/data_connector/core.rs:271)
Enhanced function call schema:
- Added call_id field to FunctionToolCall items (in addition to id)
- Added optional id field to FunctionCallOutput
- Flattened function fields in ResponseTool at top level (sgl-router/src/protocols/responses.rs:26)
Improved state management (sgl-router/src/routers/openai/conversations.rs, router.rs):
- Function calls/outputs now stored with full item structure in DB content field
- Proper ID preservation when linking items to conversations
- Better ID prefix generation (fc for function_call items)
Conversation loading (sgl-router/src/routers/openai/router.rs:813-879):
- Now loads function_call and function_call_output items from DB (previously only loaded messages)
- Filters out reasoning items (internal processing details) from upstream requests
Bug fixes: Added missing safety_identifier field in streaming processor

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-10-30T07:23:11Z

Summary of Changes

Hello @key4ng, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on a significant refactoring and enhancement of how function calls and their outputs are handled within the SGLang router. It standardizes naming conventions, updates API response structures to better represent function call details, and ensures that function calls and their results are correctly integrated into the conversation history for persistent storage and retrieval. These changes are crucial for robust and consistent function calling capabilities.

Highlights

API Schema Refinement: The API response schemas have been updated to better represent function calls, including renaming function_tool_call to function_call, adding a call_id field, and flattening function details within the ResponseTool structure.
Enhanced Conversation History Management: The system now properly stores, retrieves, and processes function_call and function_call_output items as part of the conversation history, ensuring their persistence and correct integration.
Consistent ID Generation: New logic has been introduced for generating unique IDs for function call related items, using a consistent fc prefix.
Improved Debugging and Filtering: Extensive debug logging has been added to trace the flow of input/output items, and reasoning items are now explicitly filtered out from the conversation context when communicating with the backend.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request primarily refactors function_tool_call to function_call and introduces a call_id field, aligning the implementation more closely with function calling API conventions. The changes are consistently applied across the codebase and include improvements to conversation persistence logic. I've provided a few suggestions to enhance code maintainability by refactoring repetitive code, and to improve robustness by adding error logging for deserialization failures. I also noted a minor inconsistency in struct definitions that could be addressed for better API consistency.

sgl-router/src/routers/openai/conversations.rs

sgl-router/src/routers/openai/router.rs

key4ng · 2025-10-30T16:58:15Z

function_calll_state_test.py

``` """ Function Calling Tests with State Management Tests previous_response_id and conversation features """

from openai import OpenAI
import json
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"), base_url="http://localhost:30000/v1")

Define tools

tools = [
{
"type": "function",
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
},
},
"required": ["location"],
"additionalProperties": False,
},
},
]

Mock function

def get_current_weather(location, unit="fahrenheit"):
"""Mock weather function"""
return json.dumps({
"location": location,
"temperature": 72 if unit == "fahrenheit" else 22,
"unit": unit,
"condition": "sunny",
})

print("=" * 80)
print("TEST 1: Using previous_response_id")
print("=" * 80)

First request

print("\n1. Initial request:")
response1 = client.responses.create(
model="gpt-5-nano",
input=[{"role": "user", "content": "What's the weather in San Francisco? use function call to get the weather."}],
tools=tools,
)

print(f"Response ID: {response1.id}")
print(f"Status: {response1.status}")
print(f"Output items: {len(response1.output)}")

Check for function calls

function_call_items = [item for item in response1.output if item.type == "function_call"]
print(f"Function calls: {len(function_call_items)}")

if function_call_items:
# Execute function
for item in function_call_items:
function_name = item.name
function_args = json.loads(item.arguments)

    print(f"\nExecuting function: {function_name}")
    print(f"Arguments: {json.dumps(function_args, indent=2)}")

    result = get_current_weather(**function_args)
    print(f"Result: {result}")

    # Second request using previous_response_id
    print(f"\n2. Continuing with previous_response_id: {response1.id}")
    response2 = client.responses.create(
        model="gpt-5-nano",
        previous_response_id=response1.id,
        input=[{
            "type": "function_call_output",
            "call_id": item.call_id,
            "output": result,
        }],
        tools=tools,
    )

    print(f"Response ID: {response2.id}")
    print(f"Status: {response2.status}")
    print(f"Final answer: {response2.output_text}")

    # Third request - follow up question using previous_response_id
    print(f"\n3. Follow-up question using previous_response_id: {response2.id}")
    response3 = client.responses.create(
        model="gpt-5-nano",
        previous_response_id=response2.id,
        input=[{"role": "user", "content": "Is that warm or cold?"}],
        tools=tools,
    )

    print(f"Response ID: {response3.id}")
    print(f"Status: {response3.status}")
    print(f"Follow-up answer: {response3.output_text}")

print("\n" + "=" * 80)
print("TEST 2: Using conversation")
print("=" * 80)

First, create a conversation using the Conversations API

print("\n1. Creating a new conversation via Conversations API:")
conversation = client.conversations.create(
metadata={"purpose": "weather_queries", "test": "function_calling"}
)

print(f"Conversation created: {conversation.id}")
if hasattr(conversation, 'metadata'):
print(f"Metadata: {conversation.metadata}")

Now use this conversation ID for responses

print(f"\n2. First request in conversation {conversation.id}:")
conversation_response = client.responses.create(
model="gpt-5-nano",
input=[{"role": "user", "content": "What's the weather in Tokyo, using function call?"}],
tools=tools,
conversation=conversation.id,
)

print(f"Response ID: {conversation_response.id}")
print(f"Status: {conversation_response.status}")

Check for function calls

function_call_items = [item for item in conversation_response.output if item.type == "function_call"]

if function_call_items:
# Build input with function results
continuation_input = []

for item in function_call_items:
    function_name = item.name
    function_args = json.loads(item.arguments)

    print(f"\nExecuting function: {function_name}")
    result = get_current_weather(**function_args)

    continuation_input.append({
        "type": "function_call_output",
        "call_id": item.call_id,
        "output": result,
    })

# Continue in same conversation
print(f"\n3. Continuing same conversation with function results:")
response2 = client.responses.create(
    model="gpt-5-nano",
    input=continuation_input,
    tools=tools,
    conversation=conversation.id,
)

print(f"Response ID: {response2.id}")
print(f"Final answer: {response2.output_text}")

# New turn in same conversation
print(f"\n4. New turn in same conversation:")
response3 = client.responses.create(
    model="gpt-5-nano",
    input=[{"role": "user", "content": "What about Paris?"}],
    tools=tools,
    conversation=conversation.id,
)

print(f"Response ID: {response3.id}")
print(f"Status: {response3.status}")

# Handle function calls if any
function_call_items = [item for item in response3.output if item.type == "function_call"]
if function_call_items:
    continuation_input = []
    for item in function_call_items:
        function_name = item.name
        function_args = json.loads(item.arguments)
        result = get_current_weather(**function_args)

        continuation_input.append({
            "type": "function_call_output",
            "call_id": item.call_id,
            "output": result,
        })

    response4 = client.responses.create(
        model="gpt-5-nano",
        input=continuation_input,
        tools=tools,
        conversation=conversation.id,
    )

    print(f"Response ID: {response4.id}")
    print(f"Answer about Paris: {response4.output_text}")

Retrieve the conversation to verify it was stored

print(f"\n5. Retrieving conversation {conversation.id}:")
try:
retrieved_conversation = client.conversations.retrieve(conversation.id)
print(f"Conversation retrieved successfully")
print(f"Conversation ID: {retrieved_conversation.id}")
if hasattr(retrieved_conversation, 'metadata'):
print(f"Metadata: {retrieved_conversation.metadata}")
except Exception as e:
print(f"Error retrieving conversation: {str(e)}")

print("\n" + "=" * 80)
print("TEST 3: Mixing previous_response_id with new input")
print("=" * 80)

First request

print("\n1. Initial weather request:")
response1 = client.responses.create(
model="gpt-5-nano",
input=[{"role": "user", "content": "What's the weather in London?"}],
tools=tools,
)

print(f"Response ID: {response1.id}")

Get function calls

function_call_items = [item for item in response1.output if item.type == "function_call"]

if function_call_items:
# Continue with previous_response_id AND provide function output
print(f"\n2. Providing function output with previous_response_id:")

input_items = []
for item in function_call_items:
    function_name = item.name
    function_args = json.loads(item.arguments)
    result = get_current_weather(**function_args)

    input_items.append({
        "type": "function_call_output",
        "call_id": item.call_id,
        "output": result,
    })

response2 = client.responses.create(
    model="gpt-5-nano",
    previous_response_id=response1.id,
    input=input_items,
    tools=tools,
)

print(f"Response ID: {response2.id}")
print(f"Answer: {response2.output_text}")

# Another turn with additional context
print(f"\n3. Adding user message with previous context:")
response3 = client.responses.create(
    model="gpt-5-nano",
    previous_response_id=response2.id,
    input=[{
        "role": "user",
        "content": "Should I bring an umbrella?"
    }],
    tools=tools,
)

print(f"Response ID: {response3.id}")
print(f"Recommendation: {response3.output_text}")

print("\n" + "=" * 80)
print("TEST 4: Retrieving conversation history")
print("=" * 80)

print("\n1. Creating a new conversation:")
test_conversation = client.conversations.create(
metadata={"purpose": "test_history", "location": "Miami"}
)
print(f"Created conversation: {test_conversation.id}")

print("\n2. First turn in conversation:")
response1 = client.responses.create(
model="gpt-5-nano",
input=[{"role": "user", "content": "Check weather in Miami"}],
tools=tools,
conversation=test_conversation.id,
)

print(f"Turn 1 - Response ID: {response1.id}")

Execute function if called

function_call_items = [item for item in response1.output if item.type == "function_call"]
if function_call_items:
input_items = []
for item in function_call_items:
result = get_current_weather(**json.loads(item.arguments))
input_items.append({
"type": "function_call_output",
"call_id": item.call_id,
"output": result,
})

response2 = client.responses.create(
    model="gpt-5-nano",
    input=input_items,
    tools=tools,
    conversation=test_conversation.id,
)
print(f"Turn 2 - Response ID: {response2.id}")

Try to retrieve conversation

print(f"\n3. Retrieving conversation: {test_conversation.id}")
try:
retrieved = client.conversations.retrieve(test_conversation.id)
print(f"Conversation retrieved successfully")
print(f"Conversation ID: {retrieved.id}")
if hasattr(retrieved, 'metadata'):
print(f"Metadata: {retrieved.metadata}")
except Exception as e:
print(f"Error retrieving conversation: {str(e)}")

print("\n" + "=" * 80)
print("All state management tests completed!")
print("=" * 80)

</details>

<details><summary>test result</summary>
================================================================================
TEST 1: Using previous_response_id
================================================================================

1. Initial request:
Response ID: resp_0e866c32c9d59fcd01690395717b308190a75fec079186cc9d
Status: completed
Output items: 2
Function calls: 1

Executing function: get_current_weather
Arguments: {
  "location": "San Francisco, CA",
  "unit": "fahrenheit"
}
Result: {"location": "San Francisco, CA", "temperature": 72, "unit": "fahrenheit", "condition": "sunny"}

2. Continuing with previous_response_id: resp_0e866c32c9d59fcd01690395717b308190a75fec079186cc9d
Response ID: resp_f8f7303d53d6ec390169039575aaac8190aa8ad8d5de89c31c
Status: completed
Final answer: Right now in San Francisco, CA: sunny and 72°F.

Would you like a Celsius conversion or an hourly forecast?

3. Follow-up question using previous_response_id: resp_f8f7303d53d6ec390169039575aaac8190aa8ad8d5de89c31c
Response ID: resp_f8f7303d53d6ec39016903957900288190a1f71e0fbe92a7d3
Status: completed
Follow-up answer: That’s pleasantly warm for San Francisco. 72°F is about 22°C. SF days are usually in the 60s, so 72° is on the warmer side but not hot. It can feel cooler with a breeze or fog, and warmer in direct sun. Want me to convert to Celsius or show an hourly forecast?

================================================================================
TEST 2: Using conversation
================================================================================

1. Creating a new conversation via Conversations API:
Conversation created: conv_66c24cde0f64d4d0cc1e348deee7f0453178d0e855420b55e4
Metadata: {'purpose': 'weather_queries', 'test': 'function_calling'}

2. First request in conversation conv_66c24cde0f64d4d0cc1e348deee7f0453178d0e855420b55e4:
Response ID: resp_95eaf0ee9f398d0d016903957dae208190bc73e29e99445dba
Status: completed

Executing function: get_current_weather

3. Continuing same conversation with function results:
Response ID: resp_294dc3f049a993de0169039580d2188190ab5394f260c27321
Final answer: Tokyo is currently 22°C and sunny. Would you like a forecast or to switch to Fahrenheit?

4. New turn in same conversation:
Response ID: resp_294dc3f049a993de0169039583ea148190b6bb7079327f9911
Status: completed
Response ID: resp_294dc3f049a993de0169039586c38881909b8466136020f3ee
Answer about Paris: Paris is currently 22°C and sunny. Would you like a forecast or to switch to Fahrenheit?

5. Retrieving conversation conv_66c24cde0f64d4d0cc1e348deee7f0453178d0e855420b55e4:
Conversation retrieved successfully
Conversation ID: conv_66c24cde0f64d4d0cc1e348deee7f0453178d0e855420b55e4
Metadata: {'purpose': 'weather_queries', 'test': 'function_calling'}

================================================================================
TEST 3: Mixing previous_response_id with new input
================================================================================

1. Initial weather request:
Response ID: resp_0b9e1ed1979bbe020169039588f7c08195a73f0df5ef208657

2. Providing function output with previous_response_id:
Response ID: resp_c36706da223c70a6016903958b45b08190b5a3c0e995803ac4
Answer: Right now in London it's 22°C and sunny. Want the forecast or the temperature in Fahrenheit (about 72°F)?

3. Adding user message with previous context:
Response ID: resp_c36706da223c70a6016903958ed8408190af1940c8febba3d6
Recommendation: Not right now. It’s sunny in London at 22°C (about 72°F). If you’re going to be out for the day or worry about a quickly changing forecast, you might want a compact umbrella or rain jacket just in case. Want me to check an hourly forecast for rain chances?

================================================================================
TEST 4: Retrieving conversation history
================================================================================

1. Creating a new conversation:
Created conversation: conv_a10130d60823e9c1e95e9e550ba053fba3dde7ba75c15dcd51

2. First turn in conversation:
Turn 1 - Response ID: resp_b7430d88d3a79538016903959446fc81909c4fec39999fbd27
Turn 2 - Response ID: resp_d30d64c03839929e01690395972b2c8190b3d5b40660df3790

3. Retrieving conversation: conv_a10130d60823e9c1e95e9e550ba053fba3dde7ba75c15dcd51
Conversation retrieved successfully
Conversation ID: conv_a10130d60823e9c1e95e9e550ba053fba3dde7ba75c15dcd51
Metadata: {'purpose': 'test_history', 'location': 'Miami'}

================================================================================
All state management tests completed!
================================================================================
</details>

slin1237 · 2025-10-30T18:37:23Z

make sure to rebase

key4ng · 2025-10-30T19:38:39Z

router related test succeeded: https://github.com/sgl-project/sglang/actions/runs/18952340733/job/54120039707?pr=12386

key4ng requested review from CatherineSue and slin1237 as code owners October 30, 2025 07:22

sglang-bot added the run-ci label Oct 30, 2025

gemini-code-assist bot reviewed Oct 30, 2025

View reviewed changes

sgl-router/src/routers/openai/conversations.rs Show resolved Hide resolved

sgl-router/src/routers/openai/router.rs Outdated Show resolved Hide resolved

sgl-router/src/routers/openai/router.rs Outdated Show resolved Hide resolved

key4ng changed the title ~~Function call[wip]~~ [router] Function call support for openai router Responses API Oct 30, 2025

slin1237 approved these changes Oct 30, 2025

View reviewed changes

Function call support for openai router Responses API

ef842d7

key4ng force-pushed the function-call branch from 995b42a to ef842d7 Compare October 30, 2025 18:39

specifiy clippy version to 1.90

b6f813c

key4ng requested review from merrymercy and zhyncs as code owners October 30, 2025 18:59

key4ng added 2 commits October 30, 2025 19:08

set toolchain version to 1.90

4b45aa4

fix lint

84643b1

slin1237 merged commit 4d2f17b into sgl-project:main Oct 30, 2025
43 of 71 checks passed

slin1237 mentioned this pull request Oct 30, 2025

[router] 0.2.2 release #11986

Closed

3 tasks

CatherineSue added the router label Nov 5, 2025

Conversation

key4ng commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Oct 30, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

key4ng commented Oct 30, 2025

Define tools

Mock function

First request

Check for function calls

First, create a conversation using the Conversations API

Now use this conversation ID for responses

Check for function calls

Retrieve the conversation to verify it was stored

First request

Get function calls

Execute function if called

Try to retrieve conversation

Uh oh!

slin1237 commented Oct 30, 2025

Uh oh!

key4ng commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

key4ng commented Oct 30, 2025 •

edited

Loading