[model-gateway] /parse/easoning and parse/function_call for sgl-model-gateway#15568
[model-gateway] /parse/easoning and parse/function_call for sgl-model-gateway#15568slin1237 merged 7 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
for example, |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces new endpoints /parse/reasoning and /parse/function_call to the model gateway, allowing for parsing of model outputs at the gateway level. The implementation is well-structured, adding new protocol definitions, router handlers, and server routes. I've included two main points of feedback: one to refactor the new handler logic for better readability and maintainability, and a second, more critical point about fixing the new integration tests, which currently don't exercise the parsing logic due to an issue in the test setup.
| let app_context = common::create_test_context(config.clone()).await; | ||
|
|
||
| // Create router | ||
| let router = RouterFactory::create_router(&app_context).await.unwrap(); | ||
| let router = Arc::from(router); |
There was a problem hiding this comment.
The tests for the new parser endpoints are not effective because the parser factories (tool_parser_factory and reasoning_parser_factory) are not initialized in the AppContext for the test environment. These factories are currently only created in gRPC mode, but the new endpoints are HTTP-based.
As a result, the handlers always return SERVICE_UNAVAILABLE, and the actual parsing logic is never executed. The tests are passing but not validating the intended functionality.
To fix this, you should manually inject the parser factories into the AppContext during the test setup. After doing this, you should also update the test assertions to check for StatusCode::OK and validate the response body to ensure parsing was successful.
For example, in test_parse_function_call_success, you should change the assertion to assert_eq!(resp.status(), StatusCode::OK); and add checks for the content of the JSON response.
// Create app context and inject parser factories for testing
let app_context_base = common::create_test_context(config.clone()).await;
let mut app_context_mut = (*app_context_base).clone();
app_context_mut.tool_parser_factory =
Some(sgl_model_gateway::tool_parser::ParserFactory::new());
app_context_mut.reasoning_parser_factory =
Some(sgl_model_gateway::reasoning_parser::ParserFactory::new());
let app_context = Arc::new(app_context_mut);
// Create router
let router = RouterFactory::create_router(&app_context).await.unwrap();
let router = Arc::from(router);| //! Parser handlers for function calls and reasoning extraction | ||
|
|
||
| use std::sync::Arc; | ||
|
|
||
| use axum::{ | ||
| http::StatusCode, | ||
| response::{IntoResponse, Response}, | ||
| Json, | ||
| }; | ||
| use tracing::error; | ||
|
|
||
| use crate::{ | ||
| app_context::AppContext, | ||
| protocols::parser::{ParseFunctionCallRequest, SeparateReasoningRequest}, | ||
| }; | ||
|
|
||
| /// Parse function calls from model output text | ||
| pub async fn parse_function_call( | ||
| context: Option<&Arc<AppContext>>, | ||
| req: &ParseFunctionCallRequest, | ||
| ) -> Response { | ||
| match context { | ||
| Some(ctx) => match &ctx.tool_parser_factory { | ||
| Some(factory) => match factory.registry().get_pooled_parser(&req.tool_call_parser) { | ||
| Some(pooled_parser) => { | ||
| let parser = pooled_parser.lock().await; | ||
| match parser.parse_complete(&req.text).await { | ||
| Ok((remaining_text, tool_calls)) => ( | ||
| StatusCode::OK, | ||
| Json(serde_json::json!({ | ||
| "remaining_text": remaining_text, | ||
| "tool_calls": tool_calls, | ||
| "success": true | ||
| })), | ||
| ) | ||
| .into_response(), | ||
| Err(e) => { | ||
| error!("Failed to parse function calls: {}", e); | ||
| ( | ||
| StatusCode::BAD_REQUEST, | ||
| Json(serde_json::json!({ | ||
| "error": format!("Failed to parse function calls: {}", e), | ||
| "success": false | ||
| })), | ||
| ) | ||
| .into_response() | ||
| } | ||
| } | ||
| } | ||
| None => ( | ||
| StatusCode::BAD_REQUEST, | ||
| Json(serde_json::json!({ | ||
| "error": format!("Unknown tool parser: {}", req.tool_call_parser), | ||
| "success": false | ||
| })), | ||
| ) | ||
| .into_response(), | ||
| }, | ||
| None => ( | ||
| StatusCode::SERVICE_UNAVAILABLE, | ||
| Json(serde_json::json!({ | ||
| "error": "Tool parser factory not initialized", | ||
| "success": false | ||
| })), | ||
| ) | ||
| .into_response(), | ||
| }, | ||
| None => ( | ||
| StatusCode::SERVICE_UNAVAILABLE, | ||
| Json(serde_json::json!({ | ||
| "error": "Context not initialized", | ||
| "success": false | ||
| })), | ||
| ) | ||
| .into_response(), | ||
| } | ||
| } | ||
|
|
||
| /// Parse and separate reasoning from normal text | ||
| pub async fn parse_reasoning( | ||
| context: Option<&Arc<AppContext>>, | ||
| req: &SeparateReasoningRequest, | ||
| ) -> Response { | ||
| match context { | ||
| Some(ctx) => match &ctx.reasoning_parser_factory { | ||
| Some(factory) => match factory.registry().get_pooled_parser(&req.reasoning_parser) { | ||
| Some(pooled_parser) => { | ||
| let mut parser = pooled_parser.lock().await; | ||
| match parser.detect_and_parse_reasoning(&req.text) { | ||
| Ok(result) => ( | ||
| StatusCode::OK, | ||
| Json(serde_json::json!({ | ||
| "normal_text": result.normal_text, | ||
| "reasoning_text": result.reasoning_text, | ||
| "success": true | ||
| })), | ||
| ) | ||
| .into_response(), | ||
| Err(e) => { | ||
| error!("Failed to separate reasoning: {}", e); | ||
| ( | ||
| StatusCode::BAD_REQUEST, | ||
| Json(serde_json::json!({ | ||
| "error": format!("Failed to separate reasoning: {}", e), | ||
| "success": false | ||
| })), | ||
| ) | ||
| .into_response() | ||
| } | ||
| } | ||
| } | ||
| None => ( | ||
| StatusCode::BAD_REQUEST, | ||
| Json(serde_json::json!({ | ||
| "error": format!("Unknown reasoning parser: {}", req.reasoning_parser), | ||
| "success": false | ||
| })), | ||
| ) | ||
| .into_response(), | ||
| }, | ||
| None => ( | ||
| StatusCode::SERVICE_UNAVAILABLE, | ||
| Json(serde_json::json!({ | ||
| "error": "Reasoning parser factory not initialized", | ||
| "success": false | ||
| })), | ||
| ) | ||
| .into_response(), | ||
| }, | ||
| None => ( | ||
| StatusCode::SERVICE_UNAVAILABLE, | ||
| Json(serde_json::json!({ | ||
| "error": "Context not initialized", | ||
| "success": false | ||
| })), | ||
| ) | ||
| .into_response(), | ||
| } | ||
| } |
There was a problem hiding this comment.
The parse_function_call and parse_reasoning handlers contain deeply nested match statements, which can be difficult to read and maintain. This is often referred to as the 'pyramid of doom'.
I suggest refactoring both functions to use early returns (e.g., with let-else) to flatten the control flow. Additionally, creating a helper function for error responses would reduce code duplication and improve clarity.
Here is a suggested refactoring for the entire file:
//! Parser handlers for function calls and reasoning extraction
use std::sync::Arc;
use axum::{
http::StatusCode,
response::{IntoResponse, Response},
Json,
};
use tracing::error;
use crate::{
app_context::AppContext,
protocols::parser::{ParseFunctionCallRequest, SeparateReasoningRequest},
};
/// Creates a standardized JSON error response.
fn create_error_response(status: StatusCode, error_msg: String) -> Response {
(
status,
Json(serde_json::json!({
"error": error_msg,
"success": false
})),
)
.into_response()
}
/// Parse function calls from model output text
pub async fn parse_function_call(
context: Option<&Arc<AppContext>>,
req: &ParseFunctionCallRequest,
) -> Response {
let Some(ctx) = context else {
return create_error_response(
StatusCode::SERVICE_UNAVAILABLE,
"Context not initialized".to_string(),
);
};
let Some(factory) = &ctx.tool_parser_factory else {
return create_error_response(
StatusCode::SERVICE_UNAVAILABLE,
"Tool parser factory not initialized".to_string(),
);
};
let Some(pooled_parser) = factory.registry().get_pooled_parser(&req.tool_call_parser) else {
return create_error_response(
StatusCode::BAD_REQUEST,
format!("Unknown tool parser: {}", req.tool_call_parser),
);
};
let parser = pooled_parser.lock().await;
match parser.parse_complete(&req.text).await {
Ok((remaining_text, tool_calls)) => (
StatusCode::OK,
Json(serde_json::json!({
"remaining_text": remaining_text,
"tool_calls": tool_calls,
"success": true
})),
)
.into_response(),
Err(e) => {
let error_msg = format!("Failed to parse function calls: {}", e);
error!("{}", error_msg);
create_error_response(StatusCode::BAD_REQUEST, error_msg)
}
}
}
/// Parse and separate reasoning from normal text
pub async fn parse_reasoning(
context: Option<&Arc<AppContext>>,
req: &SeparateReasoningRequest,
) -> Response {
let Some(ctx) = context else {
return create_error_response(
StatusCode::SERVICE_UNAVAILABLE,
"Context not initialized".to_string(),
);
};
let Some(factory) = &ctx.reasoning_parser_factory else {
return create_error_response(
StatusCode::SERVICE_UNAVAILABLE,
"Reasoning parser factory not initialized".to_string(),
);
};
let Some(pooled_parser) = factory.registry().get_pooled_parser(&req.reasoning_parser) else {
return create_error_response(
StatusCode::BAD_REQUEST,
format!("Unknown reasoning parser: {}", req.reasoning_parser),
);
};
let mut parser = pooled_parser.lock().await;
match parser.detect_and_parse_reasoning(&req.text) {
Ok(result) => (
StatusCode::OK,
Json(serde_json::json!({
"normal_text": result.normal_text,
"reasoning_text": result.reasoning_text,
"success": true
})),
)
.into_response(),
Err(e) => {
let error_msg = format!("Failed to separate reasoning: {}", e);
error!("{}", error_msg);
create_error_response(StatusCode::BAD_REQUEST, error_msg)
}
}
}
Motivation
Now, sglang server has
separate_reasoningandparse_function_callapis for content extraction. But when using sgl-model-gateway, to extract fn call, we should first list_workers and then call a worker, which will interfere the decoding process. So we can implement the methods in the gateway side for higher concurrency.BTW, now sglang itself holds the apis as
separate_reasoningandparse_function_callwhich can be aligned by parse/ prefix. So after discussing with @slin1237 , I implement the apis as/parse/reasoningand/parse/function_callModifications
implement apis in server.rs
implement protocals
implement src/routers/parse for handlers
Accuracy Tests
Benchmarking and Profiling
Checklist