[bugfix] add httpexception handle in stream case#10305
[bugfix] add httpexception handle in stream case#10305Bruce-x-1997 wants to merge 1 commit intosgl-project:mainfrom
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @Bruce-x-1997, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request addresses an issue where HTTP exceptions, such as a 503 "request queue full" error, were not properly handled during streaming requests, leading to unstreamed or unexpected responses. The changes ensure that such exceptions are caught and formatted into a proper streaming error response, improving the robustness and user experience of the streaming API.
Highlights
- Improved HTTP Exception Handling: Implemented HTTPException handling for streaming requests in http_server.py to ensure graceful error responses instead of unexpected behavior when exceptions like 503 (request queue full) occur.
- Consistent Streaming Error Responses: Extended HTTPException handling to the v1/chat/completions endpoint in serving_chat.py to provide a standardized streaming error format for clients.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request aims to fix a bug where HTTPExceptions were not being handled correctly in streaming responses, causing an incorrect error format. The changes introduce except HTTPException blocks in http_server.py and serving_chat.py to catch these exceptions and yield a properly formatted streaming error response.
The overall approach is correct and addresses the issue described. I've found one critical issue in serving_chat.py where an incorrect argument is passed, which would cause a TypeError. I've also left a comment in http_server.py regarding some code duplication that could be refactored for better maintainability.
Once the critical issue is addressed, this PR should be good to go.
7bd4a09 to
af5058b
Compare
|
@slin1237 please help review it & trigger ci |
|
@slin1237 could you help trigger again, I don't see error about my pr |
|
@slin1237 hello, could you trigger again? |
|
@Bruce-x-1997 Please resolve the conflicts. Also, @harrisonlimh can check on this change sglang/python/sglang/srt/managers/scheduler.py Lines 1597 to 1606 in 20a6c0a |
|
@Jimmy-L99 Please look at this, if it is similar to your fix #11904 |
Motivation
We find some http-exception online, and there is an unexpected response when streaming request meets 503(when request queue is full)
http server does not catch httpexception, so it will raise a total 503 unstream exception even this is a streaming request
and as a user, we expect a streaming error response
Modifications
add httpexception handle in generate_request and v1/chat/completions
Checklist