Skip to content

[router] Support history management using conversation#11339

Merged
slin1237 merged 9 commits intosgl-project:mainfrom
key4ng:conv-item
Oct 8, 2025
Merged

[router] Support history management using conversation#11339
slin1237 merged 9 commits intosgl-project:mainfrom
key4ng:conv-item

Conversation

@key4ng
Copy link
Collaborator

@key4ng key4ng commented Oct 8, 2025

Motivation

Support using conversation to manage the response history.
This PR:

  • enabled the response creation api to accept conversation as an input.
  • Implemented logic to persists the input and output items in oracle db or memory.
  • Implemented the conversation items api list api to get the list of conversation items in a conversation

Modifications

  • add conversation id support in response creation api
  • existing response table added new conversation_id field
  • For oracle db, creating two new tables:
    • conversation_items
    • conversation_item_links: this mapping the relationship between the conversations and items
  • For memory, using Hashmap and BtreeMap to serves as an ordered index that mimics the behavior of a database index. It can achieve:
    • O(log n) insertion and lookup
    • O(k) range queries where k is the number of results

Tests

conversation item table:
Screenshot 2025-10-08 at 2 25 46 PM

Screenshot 2025-10-08 at 2 25 08 PM

conversation item links table:
Screenshot 2025-10-08 at 2 26 48 PM

Tested both Oracle db and memory. Here is an example of oracle db result

// Create Conversation
ubuntu@moirai-a10-exp:~/sglang/sgl-router$   curl http://localhost:30000/v1/conversations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "metadata": {"topic": "program language"}
  }'
{"id":"conv_2b8ec5e7a0dff9db6cf7f5c1b25a5a5c27514d2ef22c9e5a","object":"conversation","created_at":1759950569,"metadata":{"topic":"program language"}}

// Set conversation id when create response
ubuntu@moirai-a10-exp:~/sglanubuntu@moirai-a10-exp:~/sglang/sgl-router$ curl http://localhost:30000/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-5-nano",
"input": "tell me two programming language in two sentence",
"conversation": "conv_2b8ec5e7a0dff9db6cf7f5c1b25a5a5c27514d2ef22c9e5a" 
}'
{"id":"resp_68e6b71b6d648190abbfdf5109fbd2f7","object":"response","created_at":1759950619,"status":"completed","background":false,"billing":{"payer":"developer"},"error":null,"incomplete_details":null,"instructions":null,"max_output_tokens":null,"max_tool_calls":null,"model":"gpt-5-nano-2025-08-07","output":[{"id":"rs_68e6b71c22b88190a5ea66483aa52925","type":"reasoning","summary":[]},{"id":"msg_68e6b71e75988190ac24e844d4f32350","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":"Python is a versatile, high-level programming language known for its readability and vast ecosystem. JavaScript is the core language of the web, used to create interactive features in browsers and on the server with Node.js."}],"role":"assistant"}],"parallel_tool_calls":true,"previous_response_id":null,"prompt_cache_key":null,"reasoning":{"effort":"medium","summary":null},"safety_identifier":null,"service_tier":"default","store":true,"temperature":1.0,"text":{"format":{"type":"text"},"verbosity":"medium"},"tool_choice":"auto","tools":[],"top_logprobs":0,"top_p":1.0,"truncation":"disabled","usage":{"input_tokens":14,"input_tokens_details":{"cached_tokens":0},"output_tokens":368,"output_tokens_details":{"reasoning_tokens":320},"total_tokens":382},"user":null,"metadata":{},"conversation":{"id":"conv_2b8ec5e7a0dff9db6cf7f5c1b25a5a5c27514d2ef22c9e5a"}}

// Keep passing conversation id during creating response to get previous context
ubuntu@moirai-a10-exp:~/sglang/sgl-router$ curl http://localhost:30000/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-5-nano",
"input": "tell me two more programming language you didnt mention",
"conversation": "conv_2b8ec5e7a0dff9db6cf7f5c1b25a5a5c27514d2ef22c9e5a" 
}'
{"id":"resp_8a61b4ea4109c2ef0168e6b736607081908149ce50afec4d09","object":"response","created_at":1759950646,"status":"completed","background":false,"billing":{"payer":"developer"},"error":null,"incomplete_details":null,"instructions":null,"max_output_tokens":null,"max_tool_calls":null,"model":"gpt-5-nano-2025-08-07","output":[{"id":"rs_8a61b4ea4109c2ef0168e6b736ee9c8190922058437e8a4bd6","type":"reasoning","summary":[]},{"id":"msg_8a61b4ea4109c2ef0168e6b7392dd88190be2cd006f7bf6ce7","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":"Rust is a systems programming language focused on safety and performance, with zero-cost abstractions.  \nGo (Golang) is a statically typed language designed for simplicity and scalable concurrency."}],"role":"assistant"}],"parallel_tool_calls":true,"previous_response_id":null,"prompt_cache_key":null,"reasoning":{"effort":"medium","summary":null},"safety_identifier":null,"service_tier":"default","store":true,"temperature":1.0,"text":{"format":{"type":"text"},"verbosity":"medium"},"tool_choice":"auto","tools":[],"top_logprobs":0,"top_p":1.0,"truncation":"disabled","usage":{"input_tokens":75,"input_tokens_details":{"cached_tokens":0},"output_tokens":363,"output_tokens_details":{"reasoning_tokens":320},"total_tokens":438},"user":null,"metadata":{},"conversation":{"id":"conv_2b8ec5e7a0dff9db6cf7f5c1b25a5a5c27514d2ef22c9e5a"}}

// Keep passing conversation id during creating response to get previous context
ubuntu@moirai-a10-exp:~/sglang/sgl-router$ curl http://localhost:30000/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-5-nano",
"input": "tell me two more programming language you didnt mention",
"conversation": "conv_2b8ec5e7a0dff9db6cf7f5c1b25a5a5c27514d2ef22c9e5a" 
}'
{"id":"resp_8a61b4ea4109c2ef0168e6b745381481908691cd2644ac25d2","object":"response","created_at":1759950661,"status":"completed","background":false,"billing":{"payer":"developer"},"error":null,"incomplete_details":null,"instructions":null,"max_output_tokens":null,"max_tool_calls":null,"model":"gpt-5-nano-2025-08-07","output":[{"id":"rs_8a61b4ea4109c2ef0168e6b745ef048190b043c4b1f65bc0ce","type":"reasoning","summary":[]},{"id":"msg_8a61b4ea4109c2ef0168e6b74999c48190beea1a1f5dbf5357","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":"Kotlin is a modern, statically typed language that runs on the JVM and is widely used for Android development.  \nSwift is Apple's safe, fast language for iOS and macOS development."}],"role":"assistant"}],"parallel_tool_calls":true,"previous_response_id":null,"prompt_cache_key":null,"reasoning":{"effort":"medium","summary":null},"safety_identifier":null,"service_tier":"default","store":true,"temperature":1.0,"text":{"format":{"type":"text"},"verbosity":"medium"},"tool_choice":"auto","tools":[],"top_logprobs":0,"top_p":1.0,"truncation":"disabled","usage":{"input_tokens":131,"input_tokens_details":{"cached_tokens":0},"output_tokens":557,"output_tokens_details":{"reasoning_tokens":512},"total_tokens":688},"user":null,"metadata":{},"conversation":{"id":"conv_2b8ec5e7a0dff9db6cf7f5c1b25a5a5c27514d2ef22c9e5a"}}

// List items in the conversation
ubuntu@moirai-a10-exp:~/sglang/sgl-router$ curl "http://localhcurl "http://localhost:30000/v1/conversations/conv_2b8ec5e7a0dff9db6cf7f5c1b25a5a5c27514d2ef22c9e5a/items" \
  -H "Authorization: Bearer $OPENAI_API_KEY"
{"object":"list","data":[{"id":"msg_8a61b4ea4109c2ef0168e6b74999c48190beea1a1f5dbf5357","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":"Kotlin is a modern, statically typed language that runs on the JVM and is widely used for Android development.  \nSwift is Apple's safe, fast language for iOS and macOS development."}],"role":"assistant"},{"id":"rs_8a61b4ea4109c2ef0168e6b745ef048190b043c4b1f65bc0ce","type":"reasoning","status":"completed","content":{"summary":[],"content":[]},"role":null},{"id":"msg_2e0ee92466042fb7014c5cd682fdc2789cc186d7dc675b1e","type":"message","status":"completed","content":[{"type":"input_text","text":"tell me two more programming language you didnt mention"}],"role":"user"},{"id":"msg_8a61b4ea4109c2ef0168e6b7392dd88190be2cd006f7bf6ce7","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":"Rust is a systems programming language focused on safety and performance, with zero-cost abstractions.  \nGo (Golang) is a statically typed language designed for simplicity and scalable concurrency."}],"role":"assistant"},{"id":"rs_8a61b4ea4109c2ef0168e6b736ee9c8190922058437e8a4bd6","type":"reasoning","status":"completed","content":{"summary":[],"content":[]},"role":null},{"id":"msg_f78bc3bfa91af74dd4b1eb8e85601b581c585001792ca31a","type":"message","status":"completed","content":[{"type":"input_text","text":"tell me two more programming language you didnt mention"}],"role":"user"},{"id":"msg_68e6b71e75988190ac24e844d4f32350","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":"Python is a versatile, high-level programming language known for its readability and vast ecosystem. JavaScript is the core language of the web, used to create interactive features in browsers and on the server with Node.js."}],"role":"assistant"},{"id":"rs_68e6b71c22b88190a5ea66483aa52925","type":"reasoning","status":"completed","content":{"summary":[],"content":[]},"role":null},{"id":"msg_a3dd5273916218aed65d7f517bc493458a61b4ea4109c2ef","type":"message","status":"completed","content":[{"type":"input_text","text":"tell me two programming language in two sentence"}],"role":"user"}],"first_id":"msg_8a61b4ea4109c2ef0168e6b74999c48190beea1a1f5dbf5357","last_id":"msg_a3dd5273916218aed65d7f517bc493458a61b4ea4109c2ef","has_more":false}

Benchmarking and Profiling

Checklist

@key4ng
Copy link
Collaborator Author

key4ng commented Oct 8, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces basic conversation support, which is a significant feature. The implementation looks solid, with both in-memory and Oracle database storage for conversation items. I've identified a few areas for improvement, mainly around code clarity, efficiency, and robustness in error handling. Addressing these points will enhance the maintainability and reliability of this new functionality. Great work on this feature!

@key4ng key4ng changed the title Basic Conversation function support[wip] [router] Support history management using conversation Oct 8, 2025
@key4ng key4ng marked this pull request as ready for review October 8, 2025 21:41
@slin1237 slin1237 merged commit 7ac6b90 into sgl-project:main Oct 8, 2025
35 checks passed
ch-tiger1 pushed a commit to ch-tiger1/sglang that referenced this pull request Oct 9, 2025
lpc0220 pushed a commit to lpc0220/sglang that referenced this pull request Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments