Skip to content

[model-gateway] bugfix: backward compatibility for GET endpoints#15413

Merged
slin1237 merged 1 commit intosgl-project:mainfrom
alphabetc1:bugfix/get_endpoint_compatibility
Dec 20, 2025
Merged

[model-gateway] bugfix: backward compatibility for GET endpoints#15413
slin1237 merged 1 commit intosgl-project:mainfrom
alphabetc1:bugfix/get_endpoint_compatibility

Conversation

@alphabetc1
Copy link
Contributor

@alphabetc1 alphabetc1 commented Dec 18, 2025

Summary

This commit adds backward-compatible metadata discovery: when /server_info or /model_info is unavailable, the router falls back to legacy /get_server_info and /get_model_info to keep older SGLang workers compatible with the newer SGLang router.

Root Cause

Newer router versions use /server_info and /model_info, but older SGLang workers don’t implement these endpoints (only the legacy /get_* ones), This causes server_info and model_info to be lost:

  • When dp-aware is disabled on the router: the worker registers to the router normally, but the model information is missing.
  • When dp-aware is enabled on the router: the worker fails to register to the router and reports error "Step discover_dp_info failed" (this step has a hard dependency on the dp_size information)..
How to reproduce:
1. Launch a sglang `python -m sglang.launch_server --model-path shakechen/Llama-2-7b-chat-hf --port 8000` 
2. Launch a sglang router `python -m sglang_router.launch_router --worker-urls http://127.0.0.1:8000 --port 30000`
3. Run: curl /v1/models `curl http://127.0.0.1:30000/v1/models`

# For sglang 0.5.5 + router 0.2.2, it works as expected:
# curl http://127.0.0.1:30000/v1/models
# {"models":["shakechen/Llama-2-7b-chat-hf"]}
INFO:     127.0.0.1:38382 - "GET /health HTTP/1.1" 200 OK
INFO:     127.0.0.1:38400 - "GET /get_server_info HTTP/1.1" 200 OK

# For sglang 0.5.5 + router 0.2.4, the returned model id is "unknown":
# curl http://127.0.0.1:30000/v1/models
# {"object":"list","data":[{"id":"unknown","object":"model","owned_by":"local"}]}
INFO:     127.0.0.1:45944 - "GET /server_info HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:45944 - "GET /model_info HTTP/1.1" 404 Not Found

Solution

On 404 from /server_info or /model_info, automatically retry via the corresponding legacy /get_* endpoint, and emit a deprecation warning. A FIXME notes this fallback will be removed together with the worker’s legacy /get_server_info and /get_model_info endpoints in the future.

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alphabetc1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a compatibility issue where the router failed to retrieve metadata from older SGLang workers due to a mismatch in API endpoint versions. It implements a robust fallback mechanism that allows the router to gracefully handle 404 responses from modern endpoints by automatically retrying with their deprecated legacy counterparts. This ensures continued functionality for older worker versions while providing clear deprecation warnings, facilitating a smoother transition to the updated API.

Highlights

  • Backward Compatibility for Metadata Discovery: The router now includes a fallback mechanism to discover metadata from older SGLang workers that do not support the newer /server_info and /model_info endpoints.
  • Legacy Endpoint Fallback: If a request to /server_info or /model_info results in a 404 'Not Found' error, the system will automatically retry the request using the corresponding deprecated legacy endpoints, /get_server_info and /get_model_info.
  • Deprecation Warning: A warning message is now logged when the fallback to legacy endpoints occurs, explicitly stating that these /get_* endpoints are deprecated and will be removed in a future version, encouraging migration to the newer endpoints.
  • New get_json_fallback Function: A new asynchronous utility function, get_json_fallback, has been introduced to encapsulate the logic for retrying with legacy endpoints, handling HTTP requests, and parsing JSON responses, along with emitting deprecation warnings.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds backward compatibility for metadata discovery by falling back to legacy endpoints when the new ones are not available. The implementation is correct, but it introduces some code duplication. My review includes suggestions to refactor the new code to improve maintainability by reducing this duplication.

Comment on lines +62 to +101
async fn get_json_fallback(
base_url: &str,
endpoint: &str,
api_key: Option<&str>,
) -> Result<Value, String> {
// FIXME: This fallback logic should be removed together with /get_server_info
// and /get_model_info endpoints in http_server.py
warn!(
concat!(
"Endpoint '/{}' returned 404, falling back to '/get_{}' for backward compatibility. ",
"The '/get_{}' endpoint is deprecated and will be removed in a future version. ",
"Please use '/{}' instead."
),
endpoint, endpoint, endpoint, endpoint
);

let old_url = format!("{}/get_{}", base_url, endpoint);
let mut req = HTTP_CLIENT.get(&old_url);
if let Some(key) = api_key {
req = req.bearer_auth(key);
}

let response = req
.send()
.await
.map_err(|e| format!("Failed to connect to {}: {}", old_url, e))?;

if !response.status().is_success() {
return Err(format!(
"Server returned status {} from {}",
response.status(),
old_url
));
}

response
.json::<Value>()
.await
.map_err(|e| format!("Failed to parse response from {}: {}", old_url, e))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To reduce code duplication in get_server_info and get_model_info, you can make this function generic to handle deserialization. This moves the serde_json::from_value logic inside get_json_fallback and simplifies the call sites. This change will allow you to simplify the fallback logic in both get_server_info and get_model_info as suggested in the other comments. Using serde::de::DeserializeOwned is a bit more idiomatic here.

async fn get_json_fallback<T: serde::de::DeserializeOwned>(
    base_url: &str,
    endpoint: &str,
    api_key: Option<&str>,
) -> Result<T, String> {
    // FIXME: This fallback logic should be removed together with /get_server_info
    // and /get_model_info endpoints in http_server.py
    warn!(
        concat!(
            "Endpoint '/{}' returned 404, falling back to '/get_{}' for backward compatibility. ",
            "The '/get_{}' endpoint is deprecated and will be removed in a future version. ",
            "Please use '/{}' instead."
        ),
        endpoint, endpoint, endpoint, endpoint
    );

    let old_url = format!("{}/get_{}", base_url, endpoint);
    let mut req = HTTP_CLIENT.get(&old_url);
    if let Some(key) = api_key {
        req = req.bearer_auth(key);
    }

    let response = req
        .send()
        .await
        .map_err(|e| format!("Failed to connect to {}: {}", old_url, e))?;

    if !response.status().is_success() {
        return Err(format!(
            "Server returned status {} from {}",
            response.status(),
            old_url
        ));
    }

    let value: Value = response
        .json()
        .await
        .map_err(|e| format!("Failed to parse response from {}: {}", old_url, e))?;

    serde_json::from_value(value)
        .map_err(|e| format!("Failed to parse {} from fallback response: {}", endpoint, e))
}

Comment on lines +118 to +123
// If /server_info returns 404, fallback to /get_server_info for backward compatibility
if response.status() == reqwest::StatusCode::NOT_FOUND {
let json = get_json_fallback(base_url, "server_info", api_key).await?;
return serde_json::from_value(json)
.map_err(|e| format!("Failed to parse server info: {}", e));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With the suggested generic get_json_fallback function, this block can be simplified to a single call, removing the duplicated deserialization logic.

    // If /server_info returns 404, fallback to /get_server_info for backward compatibility
    if response.status() == reqwest::StatusCode::NOT_FOUND {
        return get_json_fallback::<ServerInfo>(base_url, "server_info", api_key).await;
    }

Comment on lines +156 to +161
// If /model_info returns 404, fallback to /get_model_info for backward compatibility
if response.status() == reqwest::StatusCode::NOT_FOUND {
let json = get_json_fallback(base_url, "model_info", api_key).await?;
return serde_json::from_value(json)
.map_err(|e| format!("Failed to parse model info: {}", e));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With the suggested generic get_json_fallback function, this block can be simplified to a single call, removing the duplicated deserialization logic.

    // If /model_info returns 404, fallback to /get_model_info for backward compatibility
    if response.status() == reqwest::StatusCode::NOT_FOUND {
        return get_json_fallback::<ModelInfo>(base_url, "model_info", api_key).await;
    }

@alphabetc1 alphabetc1 changed the title fix: GET endpoint compatibility with the old version fix(gateway): backward compatibility for GET endpoints Dec 18, 2025
@alphabetc1 alphabetc1 changed the title fix(gateway): backward compatibility for GET endpoints [model-gateway] Backward compatibility for GET endpoints Dec 20, 2025
@alphabetc1 alphabetc1 changed the title [model-gateway] Backward compatibility for GET endpoints [model-gateway] bugfix: backward compatibility for GET endpoints Dec 20, 2025
@slin1237 slin1237 merged commit 1d90b19 into sgl-project:main Dec 20, 2025
65 of 68 checks passed
@alphabetc1 alphabetc1 deleted the bugfix/get_endpoint_compatibility branch December 21, 2025 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments