Skip to content

[model-gateway] /parse/easoning and parse/function_call for sgl-model-gateway#15568

Merged
slin1237 merged 7 commits intosgl-project:mainfrom
UbeCc:feat/router_parse_api
Dec 21, 2025
Merged

[model-gateway] /parse/easoning and parse/function_call for sgl-model-gateway#15568
slin1237 merged 7 commits intosgl-project:mainfrom
UbeCc:feat/router_parse_api

Conversation

@UbeCc
Copy link
Contributor

@UbeCc UbeCc commented Dec 21, 2025

Motivation

Now, sglang server has separate_reasoning and parse_function_call apis for content extraction. But when using sgl-model-gateway, to extract fn call, we should first list_workers and then call a worker, which will interfere the decoding process. So we can implement the methods in the gateway side for higher concurrency.

BTW, now sglang itself holds the apis as separate_reasoning and parse_function_call which can be aligned by parse/ prefix. So after discussing with @slin1237 , I implement the apis as /parse/reasoning and /parse/function_call

Modifications

implement apis in server.rs

  • POST /parse/reasoning
  • POST /parse/function_call

implement protocals

  • ParseFunctionCallRequest
  • SeparateReasoningRequest

implement src/routers/parse for handlers

Accuracy Tests

Benchmarking and Profiling

image

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@UbeCc
Copy link
Contributor Author

UbeCc commented Dec 21, 2025

for example,

curl -X POST http://localhost:8000/parse_function_call \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Call get_weather <tool_call>{\"function_name\": \"get_weather\", \"parameters\": {\"location\": \"Singapore\"}}</tool_call>",
    "tool_call_parser": "json",
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get weather",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string"}
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'
curl -X POST http://localhost:8000/separate_reasoning \
  -H "Content-Type: application/json" \
  -d '{
    "text": "<think>Step 1: Understand the problem</think>Final answer",
    "reasoning_parser": "step3"
  }

@UbeCc UbeCc changed the title separate_reasoning and parse_function_call for sgl-model-gateway [sgl-model-gateway] separate_reasoning and parse_function_call for sgl-model-gateway Dec 21, 2025
@UbeCc UbeCc changed the title [sgl-model-gateway] separate_reasoning and parse_function_call for sgl-model-gateway [model-gateway] separate_reasoning and parse_function_call for sgl-model-gateway Dec 21, 2025
@UbeCc UbeCc changed the title [model-gateway] separate_reasoning and parse_function_call for sgl-model-gateway [model-gateway] /parse/easoning and parse/function_call for sgl-model-gateway Dec 21, 2025
@slin1237
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new endpoints /parse/reasoning and /parse/function_call to the model gateway, allowing for parsing of model outputs at the gateway level. The implementation is well-structured, adding new protocol definitions, router handlers, and server routes. I've included two main points of feedback: one to refactor the new handler logic for better readability and maintainability, and a second, more critical point about fixing the new integration tests, which currently don't exercise the parsing logic due to an issue in the test setup.

Comment on lines +92 to +96
let app_context = common::create_test_context(config.clone()).await;

// Create router
let router = RouterFactory::create_router(&app_context).await.unwrap();
let router = Arc::from(router);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The tests for the new parser endpoints are not effective because the parser factories (tool_parser_factory and reasoning_parser_factory) are not initialized in the AppContext for the test environment. These factories are currently only created in gRPC mode, but the new endpoints are HTTP-based.

As a result, the handlers always return SERVICE_UNAVAILABLE, and the actual parsing logic is never executed. The tests are passing but not validating the intended functionality.

To fix this, you should manually inject the parser factories into the AppContext during the test setup. After doing this, you should also update the test assertions to check for StatusCode::OK and validate the response body to ensure parsing was successful.

For example, in test_parse_function_call_success, you should change the assertion to assert_eq!(resp.status(), StatusCode::OK); and add checks for the content of the JSON response.

        // Create app context and inject parser factories for testing
        let app_context_base = common::create_test_context(config.clone()).await;
        let mut app_context_mut = (*app_context_base).clone();
        app_context_mut.tool_parser_factory =
            Some(sgl_model_gateway::tool_parser::ParserFactory::new());
        app_context_mut.reasoning_parser_factory =
            Some(sgl_model_gateway::reasoning_parser::ParserFactory::new());
        let app_context = Arc::new(app_context_mut);

        // Create router
        let router = RouterFactory::create_router(&app_context).await.unwrap();
        let router = Arc::from(router);

Comment on lines +1 to +139
//! Parser handlers for function calls and reasoning extraction

use std::sync::Arc;

use axum::{
http::StatusCode,
response::{IntoResponse, Response},
Json,
};
use tracing::error;

use crate::{
app_context::AppContext,
protocols::parser::{ParseFunctionCallRequest, SeparateReasoningRequest},
};

/// Parse function calls from model output text
pub async fn parse_function_call(
context: Option<&Arc<AppContext>>,
req: &ParseFunctionCallRequest,
) -> Response {
match context {
Some(ctx) => match &ctx.tool_parser_factory {
Some(factory) => match factory.registry().get_pooled_parser(&req.tool_call_parser) {
Some(pooled_parser) => {
let parser = pooled_parser.lock().await;
match parser.parse_complete(&req.text).await {
Ok((remaining_text, tool_calls)) => (
StatusCode::OK,
Json(serde_json::json!({
"remaining_text": remaining_text,
"tool_calls": tool_calls,
"success": true
})),
)
.into_response(),
Err(e) => {
error!("Failed to parse function calls: {}", e);
(
StatusCode::BAD_REQUEST,
Json(serde_json::json!({
"error": format!("Failed to parse function calls: {}", e),
"success": false
})),
)
.into_response()
}
}
}
None => (
StatusCode::BAD_REQUEST,
Json(serde_json::json!({
"error": format!("Unknown tool parser: {}", req.tool_call_parser),
"success": false
})),
)
.into_response(),
},
None => (
StatusCode::SERVICE_UNAVAILABLE,
Json(serde_json::json!({
"error": "Tool parser factory not initialized",
"success": false
})),
)
.into_response(),
},
None => (
StatusCode::SERVICE_UNAVAILABLE,
Json(serde_json::json!({
"error": "Context not initialized",
"success": false
})),
)
.into_response(),
}
}

/// Parse and separate reasoning from normal text
pub async fn parse_reasoning(
context: Option<&Arc<AppContext>>,
req: &SeparateReasoningRequest,
) -> Response {
match context {
Some(ctx) => match &ctx.reasoning_parser_factory {
Some(factory) => match factory.registry().get_pooled_parser(&req.reasoning_parser) {
Some(pooled_parser) => {
let mut parser = pooled_parser.lock().await;
match parser.detect_and_parse_reasoning(&req.text) {
Ok(result) => (
StatusCode::OK,
Json(serde_json::json!({
"normal_text": result.normal_text,
"reasoning_text": result.reasoning_text,
"success": true
})),
)
.into_response(),
Err(e) => {
error!("Failed to separate reasoning: {}", e);
(
StatusCode::BAD_REQUEST,
Json(serde_json::json!({
"error": format!("Failed to separate reasoning: {}", e),
"success": false
})),
)
.into_response()
}
}
}
None => (
StatusCode::BAD_REQUEST,
Json(serde_json::json!({
"error": format!("Unknown reasoning parser: {}", req.reasoning_parser),
"success": false
})),
)
.into_response(),
},
None => (
StatusCode::SERVICE_UNAVAILABLE,
Json(serde_json::json!({
"error": "Reasoning parser factory not initialized",
"success": false
})),
)
.into_response(),
},
None => (
StatusCode::SERVICE_UNAVAILABLE,
Json(serde_json::json!({
"error": "Context not initialized",
"success": false
})),
)
.into_response(),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parse_function_call and parse_reasoning handlers contain deeply nested match statements, which can be difficult to read and maintain. This is often referred to as the 'pyramid of doom'.

I suggest refactoring both functions to use early returns (e.g., with let-else) to flatten the control flow. Additionally, creating a helper function for error responses would reduce code duplication and improve clarity.

Here is a suggested refactoring for the entire file:

//! Parser handlers for function calls and reasoning extraction

use std::sync::Arc;

use axum::{
    http::StatusCode,
    response::{IntoResponse, Response},
    Json,
};
use tracing::error;

use crate::{
    app_context::AppContext,
    protocols::parser::{ParseFunctionCallRequest, SeparateReasoningRequest},
};

/// Creates a standardized JSON error response.
fn create_error_response(status: StatusCode, error_msg: String) -> Response {
    (
        status,
        Json(serde_json::json!({
            "error": error_msg,
            "success": false
        })),
    )
        .into_response()
}

/// Parse function calls from model output text
pub async fn parse_function_call(
    context: Option<&Arc<AppContext>>,
    req: &ParseFunctionCallRequest,
) -> Response {
    let Some(ctx) = context else {
        return create_error_response(
            StatusCode::SERVICE_UNAVAILABLE,
            "Context not initialized".to_string(),
        );
    };

    let Some(factory) = &ctx.tool_parser_factory else {
        return create_error_response(
            StatusCode::SERVICE_UNAVAILABLE,
            "Tool parser factory not initialized".to_string(),
        );
    };

    let Some(pooled_parser) = factory.registry().get_pooled_parser(&req.tool_call_parser) else {
        return create_error_response(
            StatusCode::BAD_REQUEST,
            format!("Unknown tool parser: {}", req.tool_call_parser),
        );
    };

    let parser = pooled_parser.lock().await;
    match parser.parse_complete(&req.text).await {
        Ok((remaining_text, tool_calls)) => (
            StatusCode::OK,
            Json(serde_json::json!({
                "remaining_text": remaining_text,
                "tool_calls": tool_calls,
                "success": true
            })),
        )
            .into_response(),
        Err(e) => {
            let error_msg = format!("Failed to parse function calls: {}", e);
            error!("{}", error_msg);
            create_error_response(StatusCode::BAD_REQUEST, error_msg)
        }
    }
}

/// Parse and separate reasoning from normal text
pub async fn parse_reasoning(
    context: Option<&Arc<AppContext>>,
    req: &SeparateReasoningRequest,
) -> Response {
    let Some(ctx) = context else {
        return create_error_response(
            StatusCode::SERVICE_UNAVAILABLE,
            "Context not initialized".to_string(),
        );
    };

    let Some(factory) = &ctx.reasoning_parser_factory else {
        return create_error_response(
            StatusCode::SERVICE_UNAVAILABLE,
            "Reasoning parser factory not initialized".to_string(),
        );
    };

    let Some(pooled_parser) = factory.registry().get_pooled_parser(&req.reasoning_parser) else {
        return create_error_response(
            StatusCode::BAD_REQUEST,
            format!("Unknown reasoning parser: {}", req.reasoning_parser),
        );
    };

    let mut parser = pooled_parser.lock().await;
    match parser.detect_and_parse_reasoning(&req.text) {
        Ok(result) => (
            StatusCode::OK,
            Json(serde_json::json!({
                "normal_text": result.normal_text,
                "reasoning_text": result.reasoning_text,
                "success": true
            })),
        )
            .into_response(),
        Err(e) => {
            let error_msg = format!("Failed to separate reasoning: {}", e);
            error!("{}", error_msg);
            create_error_response(StatusCode::BAD_REQUEST, error_msg)
        }
    }
}

@slin1237 slin1237 merged commit ef001fb into sgl-project:main Dec 21, 2025
61 checks passed
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
GuoYechang pushed a commit to GuoYechang/sglang that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments