[model-gateway] Use UUIDs for router-managed worker resources by alphabetc1 · Pull Request #15540 · sgl-project/sglang

alphabetc1 · 2025-12-20T18:00:12Z

Motivation

Previously, we used endpoint instead of UUID of each resource to make it compatible with old delete worker endpoint.
However, URIs are mutable and can be long or ambiguous as identifiers, for example, deleting a worker required calling DELETE /workers/{url} and URL-encoding the full worker URL in the path, which was cumbersome and error-prone.

We're now making a breaking change: router resources are now identified and managed by UUIDs instead of URIs, meanwhile, GET/PUT/DELETE /workers/{url} methods will also be changed to GET/PUT/DELETE /workers/{worker_id}.

Example

For now, GET /workers will show worker's UUID instead of URL.

curl -sS 'http://127.0.0.1:30000/workers'

{"workers":[{"id":"8769987f-0d0e-4f5e-89bc-c89bf557bb0b","url":"http://127.0.0.1:8000","model_id":"test","priority":50,"cost":1.0,"worker_type":"regular","is_healthy":true,"load":0,"connection_mode":"HTTP","metadata":{"model_path":"/root/models/Qwen3-32B-FP8","model_type":"qwen3","architectures":"[\"Qwen3ForCausalLM\"]","served_model_name":"test"}}

And we use /workers/{worker_id} to perform resource manage instead of /workers/{url}:

curl -X GET "http://localhost:30000/workers/8769987f-0d0e-4f5e-89bc-c89bf557bb0b"
curl -X DELETE "http://localhost:30000/workers/8769987f-0d0e-4f5e-89bc-c89bf557bb0b"

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-20T18:00:25Z

Summary of Changes

Hello @alphabetc1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the usability of the model-gateway by introducing a more intuitive and flexible method for deleting workers. It adds a new DELETE /workers endpoint that can process worker URLs provided either as a query parameter or within a JSON payload. This streamlines the deletion process, making it less prone to errors compared to the previous method of URL-encoding the worker URL in the path, while maintaining full compatibility with the existing endpoint.

Highlights

New DELETE /workers Endpoint: Introduces a new DELETE endpoint at /workers for worker deletion, offering a more flexible approach.
Flexible Worker Deletion: Workers can now be deleted by providing their URL either as a query parameter (e.g., ?url=http://...) or within a JSON request body (e.g., {"url": "http://..."}).
Improved Usability: This change simplifies the operator workflow by eliminating the need for cumbersome URL-encoding of the worker URL directly in the path, which was previously required for the /workers/{url} endpoint.
Backward Compatibility: The original DELETE /workers/{url} endpoint remains fully functional, ensuring existing integrations are not broken.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds a more convenient endpoint DELETE /workers for deleting workers, accepting the URL via query parameter or JSON body. This is a nice usability improvement and reuses existing logic well. However, I've found a critical borrow-checker issue in the new handler that will prevent compilation, for which I've provided a fix. I've also included a suggestion to improve code readability.

gemini-code-assist · 2025-12-20T18:02:31Z

sgl-model-gateway/src/server.rs

+    let url_from_query = req.uri().query().and_then(|qs| {
+        url::form_urlencoded::parse(qs.as_bytes())
+            .find(|(k, _)| k == "url")
+            .map(|(_, v)| v.into_owned())
+    });
+
+    let url = if let Some(url) = url_from_query {
+        Some(url)
+    } else {
+        // Fallback: JSON body: {"url":"http://..."}
+        let (_parts, body) = req.into_parts();


The current implementation has a borrow checker error. req.uri().query() borrows req, and this borrow is still active when req.into_parts() is called in the else block. This attempts to move req while it is borrowed, which will cause a compilation failure.

To fix this, you should first extract the query string into an owned String. This releases the borrow on req, allowing it to be safely consumed later.

Suggested change

let url_from_query = req.uri().query().and_then(|qs| {

url::form_urlencoded::parse(qs.as_bytes())

.find(|(k, _)| k == "url")

.map(|(_, v)| v.into_owned())

});

let url = if let Some(url) = url_from_query {

Some(url)

} else {

// Fallback: JSON body: {"url":"http://..."}

let (_parts, body) = req.into_parts();

let query_str = req.uri().query().map(str::to_string);

let url_from_query = query_str.as_deref().and_then(|qs| {

url::form_urlencoded::parse(qs.as_bytes())

.find(|(k, _)| k == "url")

.map(|(_, v)| v.into_owned())

});

let url = if let Some(url) = url_from_query {

Some(url)

} else {

// Fallback: JSON body: {"url":"http://..."}

let (_parts, body) = req.into_parts();

gemini-code-assist · 2025-12-20T18:02:31Z

sgl-model-gateway/src/server.rs

+            .and_then(|bytes| {
+                (!bytes.is_empty())
+                    .then(|| serde_json::from_slice::<DeleteWorkerQuery>(&bytes).ok())
+                    .flatten()
+            })


This chain of then and flatten to handle an empty request body is a bit dense and can be hard to follow. Using a more explicit if check for an empty byte slice would make the logic clearer and improve maintainability.

.and_then(|bytes| { if bytes.is_empty() { None } else { serde_json::from_slice::<DeleteWorkerQuery>(&bytes).ok() } })

slin1237 · 2025-12-20T19:30:46Z

the reason why we used endpoint instead of UUID of each resource is to make it compatible with old delete worker endpoint
perhaps
instead of a new endpoint
shall we make a breaking change to use UUID?

alphabetc1 · 2025-12-21T04:16:03Z

the reason why we used endpoint instead of UUID of each resource is to make it compatible with old delete worker endpoint perhaps instead of a new endpoint shall we make a breaking change to use UUID?

Agree, this is more RESTful and also more convenient to use.
So would it be acceptable for me to make a breaking change to only keep UUID-based deletion?
e.g.

The id returned in GET /workers will be a UUID.

{"workers":[{"id":"8769987f-0d0e-4f5e-89bc-c89bf557bb0b","url":"http://127.0.0.1:8000","model_id":"test","priority":50,"cost":1.0,"worker_type":"regular","is_healthy":true,"load":0,"connection_mode":"HTTP","metadata":{"model_path":"/root/models/Qwen3-32B-FP8","model_type":"qwen3","architectures":"[\"Qwen3ForCausalLM\"]","served_model_name":"test"}}

DELETE

curl -X DELETE "http://localhost:30000/workers/8769987f-0d0e-4f5e-89bc-c89bf557bb0b"

Meanwhile, GET/PUT /workers/{url} methods will also be changed to GET/PUT /workers/{worker_id}

slin1237 · 2025-12-21T04:35:44Z

feel free to make that change

create should return UID right away, since it's async
GET/PUT/DELETE should all be using UID instead
also feel free to only make changes on delete. i can help with other endpoints
this type of correction should happen earlier rather than later.

slin1237 · 2025-12-21T17:14:01Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant and beneficial breaking change by switching from URL-based worker identification to UUIDs. This improves the robustness and usability of the worker management API. The changes are well-implemented across the documentation, tests, and core server logic. My review focuses on a few areas for improvement: addressing a race condition in UUID reservation, optimizing a lookup function, and reducing code duplication in the API handlers for better maintainability.

gemini-code-assist · 2025-12-21T17:16:07Z

sgl-model-gateway/src/core/worker_registry.rs

+        if let Some(existing_id) = self.url_to_id.get(url) {
+            return existing_id.clone();
+        }
+        let worker_id = WorkerId::new();
+        self.url_to_id.insert(url.to_string(), worker_id.clone());
+        worker_id


There's a race condition in this function. If two threads call reserve_id_for_url for the same new URL concurrently, both can pass the if let Some(...) check, generate a new WorkerId, and insert it into the map. The last one to insert will win, but both threads will return a WorkerId, one of which will be for a mapping that was immediately overwritten. This can lead to inconsistent state.

To fix this and make the operation atomic, you can use the entry API of DashMap.

Suggested change

if let Some(existing_id) = self.url_to_id.get(url) {

return existing_id.clone();

}

let worker_id = WorkerId::new();

self.url_to_id.insert(url.to_string(), worker_id.clone());

worker_id

self.url_to_id

.entry(url.to_string())

.or_insert_with(WorkerId::new)

.clone()

gemini-code-assist · 2025-12-21T17:16:07Z

sgl-model-gateway/src/core/worker_registry.rs

+        self.url_to_id
+            .iter()
+            .find_map(|entry| (entry.value() == worker_id).then(|| entry.key().clone()))


The fallback logic in this function iterates over the url_to_id map to find a URL by its worker ID. This is an O(N) operation and can become inefficient if the number of workers is large.

For better performance, consider adding a reverse mapping, e.g., id_to_url: Arc<DashMap<WorkerId, String>>. This would make the lookup an O(1) operation. You would need to update this new map in reserve_id_for_url and remove functions accordingly.

gemini-code-assist · 2025-12-21T17:16:08Z

sgl-model-gateway/src/server.rs

+            let worker_info = WorkerInfo {
+                id: worker_id.as_str().to_string(),
+                url: worker_url.clone(),
+                model_id: "unknown".to_string(),
+                priority: 0,
+                cost: 1.0,
+                worker_type: "unknown".to_string(),
+                is_healthy: false,
+                load: 0,
+                connection_mode: "unknown".to_string(),
+                runtime_type: None,
+                tokenizer_path: None,
+                reasoning_parser: None,
+                tool_parser: None,
+                chat_template: None,
+                bootstrap_port: None,
+                metadata: HashMap::new(),
+                job_status: Some(status),
+            };


This block of code to create a partial WorkerInfo for a worker that is still in the job queue is quite verbose and hard to read.

To improve readability and maintainability, consider extracting this logic into a helper function within this module. For example:

fn create_partial_worker_info_from_status( worker_id: &crate::core::WorkerId, worker_url: String, status: crate::protocols::worker_spec::JobStatus, ) -> crate::protocols::worker_spec::WorkerInfo { // ... implementation from these lines ... }

Then you could simply call this function here.

gemini-code-assist · 2025-12-21T17:16:08Z

sgl-model-gateway/src/server.rs

+    let worker_id = match parse_worker_id(&worker_id_raw) {
+        Ok(id) => id,
+        Err(msg) => {
+            let error = WorkerErrorResponse {
+                error: msg,
+                code: "BAD_REQUEST".to_string(),
+            };
+            return (StatusCode::BAD_REQUEST, Json(error)).into_response();
+        }
+    };
+
+    let Some(url) = state.context.worker_registry.get_url_by_id(&worker_id) else {
+        let error = WorkerErrorResponse {
+            error: format!("Worker {worker_id_raw} not found"),
+            code: "WORKER_NOT_FOUND".to_string(),
+        };
+        return (StatusCode::NOT_FOUND, Json(error)).into_response();
+    };


The logic for parsing the worker_id from the path and resolving it to a URL is duplicated across get_worker, delete_worker, and update_worker handlers. This increases code maintenance overhead.

Consider creating a custom Axum extractor to handle this logic. An extractor could parse the worker_id, resolve it to a URL, and handle BAD_REQUEST and NOT_FOUND errors centrally. This would significantly simplify the handlers and improve maintainability.

For example, you could create a ResolvedWorker extractor that provides (WorkerId, String) and your handlers would look like:

async fn delete_worker( State(state): State<Arc<AppState>>, ResolvedWorker { id, url }: ResolvedWorker, ) -> Response { // ... handler logic using id and url }

…oject#15540)

alphabetc1 requested review from CatherineSue and slin1237 as code owners December 20, 2025 18:00

github-actions bot added the model-gateway label Dec 20, 2025

gemini-code-assist bot reviewed Dec 20, 2025

View reviewed changes

github-actions bot added the documentation Improvements or additions to documentation label Dec 20, 2025

slin1237 added the run-ci label Dec 21, 2025

feat: Use UUIDs for router-managed worker resources

7d8e236

alphabetc1 force-pushed the feature/support_del_worker branch from 3d9cf19 to 7d8e236 Compare December 21, 2025 05:46

alphabetc1 requested a review from key4ng as a code owner December 21, 2025 05:46

alphabetc1 changed the title ~~[model-gateway] add simpler worker deletion via /workers endpoint~~ [model-gateway] Use UUIDs for router-managed worker resources Dec 21, 2025

slin1237 approved these changes Dec 21, 2025

View reviewed changes

gemini-code-assist bot reviewed Dec 21, 2025

View reviewed changes

slin1237 merged commit 1d9ba2c into sgl-project:main Dec 21, 2025
69 of 70 checks passed

alphabetc1 deleted the feature/support_del_worker branch December 24, 2025 06:19

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

[model-gateway] Use UUIDs for router-managed worker resources (sgl-pr…

a17f1ac

…oject#15540)

GuoYechang pushed a commit to GuoYechang/sglang that referenced this pull request Jan 13, 2026

[model-gateway] Use UUIDs for router-managed worker resources (sgl-pr…

bc2441e

…oject#15540)

Conversation

alphabetc1 commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Example

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Dec 20, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

slin1237 commented Dec 20, 2025

Uh oh!

alphabetc1 commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slin1237 commented Dec 21, 2025

Uh oh!

slin1237 commented Dec 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

alphabetc1 commented Dec 20, 2025 •

edited

Loading

alphabetc1 commented Dec 21, 2025 •

edited

Loading