Skip to content

Comments

[Fix]HTTP Stream raise exception#11904

Merged
hnyls2002 merged 2 commits intosgl-project:mainfrom
jimmy-evo:fix/stream_raise_exception
Nov 6, 2025
Merged

[Fix]HTTP Stream raise exception#11904
hnyls2002 merged 2 commits intosgl-project:mainfrom
jimmy-evo:fix/stream_raise_exception

Conversation

@jimmy-evo
Copy link
Contributor

@jimmy-evo jimmy-evo commented Oct 21, 2025

Motivation

stream is started, cannot raise HTTP Exception

Modifications

if stream, just break loop

when tokenizer_manager get a exception from a stream request, just break the loop, client will receive

before:

[2025-10-22 13:15:38] ERROR:    Exception in ASGI application                                                                                                
Traceback (most recent call last):                                                                                                                           
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run                                                                                            
    return runner.run(main)                                                                                                                                  
           ^^^^^^^^^^^^^^^^                                                                                                                                  
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run                                                                                            
    return self._loop.run_until_complete(task)                                                                                                               
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                               
  File "uvloop/loop.pyx", line 1512, in uvloop.loop.Loop.run_until_complete                                                                                  
  File "uvloop/loop.pyx", line 1505, in uvloop.loop.Loop.run_until_complete                                                                                  
  File "uvloop/loop.pyx", line 1379, in uvloop.loop.Loop.run_forever                                                                                         
  File "uvloop/loop.pyx", line 557, in uvloop.loop.Loop._run                                                                                                 
  File "uvloop/loop.pyx", line 476, in uvloop.loop.Loop._on_idle                                                                                             
  File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run                                                                                           
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run                                                                                           
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 2128, in running_phase_sigquit_handler                                  
    kill_process_tree(os.getpid())                                                                                                                           
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 995, in kill_process_tree                                                             
    sys.exit(0)                                                                                                                                              
SystemExit: 0                                                                                                                                                
                                                                                                                                                             
During handling of the above exception, another exception occurred:                                                                                          
                                                                                                                                                             
Traceback (most recent call last):                                                                                                                           
  File "/usr/local/lib/python3.12/dist-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi                                                   
    result = await app(  # type: ignore[func-returns-value]                                                                                                  
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                  
  File "/usr/local/lib/python3.12/dist-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__                                                   
    return await self.app(scope, receive, send)                                                                                                              
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                              
  File "/usr/local/lib/python3.12/dist-packages/fastapi/applications.py", line 1133, in __call__                                                             
    await super().__call__(scope, receive, send)                                                                                                             
  File "/usr/local/lib/python3.12/dist-packages/starlette/applications.py", line 113, in __call__                                                            
    await self.middleware_stack(scope, receive, send)                                                                                                        
  File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/errors.py", line 164, in __call__                                                       
    await self.app(scope, receive, _send)                                                                                                                    
  File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/cors.py", line 85, in __call__                                                          
    await self.app(scope, receive, send)                                                                                                                     
  File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/exceptions.py", line 63, in __call__                                                    
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)                                                                                 
  File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 42, in wrapped_app                                                    
    await app(scope, receive, sender)                                                                                                                        
  File "/usr/local/lib/python3.12/dist-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__                                                  
    await self.app(scope, receive, send)                                                                                                                     
  File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 716, in __call__                                                                 
    await self.middleware_stack(scope, receive, send)                                                                                                        
  File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 736, in app                                                                      
    await route.handle(scope, receive, send)                                                                                                                 
  File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 290, in handle                                                                   
    await self.app(scope, receive, send)                                                                                                                     
  File "/usr/local/lib/python3.12/dist-packages/fastapi/routing.py", line 123, in app                                                                        
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)                                                                                   
  File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 42, in wrapped_app                                                    
    await app(scope, receive, sender)                                                                                                                        
  File "/usr/local/lib/python3.12/dist-packages/fastapi/routing.py", line 110, in app                                                                        
    await response(scope, receive, send)                                                                                                                     
  File "/usr/local/lib/python3.12/dist-packages/starlette/responses.py", line 271, in __call__                                                               
    async with anyio.create_task_group() as task_group:                                                                                                      
               ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                     
  File "/usr/local/lib/python3.12/dist-packages/anyio/_backends/_asyncio.py", line 785, in __aexit__                                                         
    raise exc_val                                                                                                                                            
  File "/usr/local/lib/python3.12/dist-packages/starlette/responses.py", line 278, in __call__                                                               
    await wrap(partial(self.listen_for_disconnect, receive))                                                                                                 
  File "/usr/local/lib/python3.12/dist-packages/starlette/responses.py", line 274, in wrap                                                                   
    await func()                                                                                                                                             
  File "/usr/local/lib/python3.12/dist-packages/starlette/responses.py", line 242, in listen_for_disconnect                                                  
    message = await receive()                                                                                                                                
              ^^^^^^^^^^^^^^^                                                                                                                                
  File "/usr/local/lib/python3.12/dist-packages/uvicorn/protocols/http/h11_impl.py", line 531, in receive                                                    
    await self.message_event.wait()
  File "/usr/lib/python3.12/asyncio/locks.py", line 212, in wait
    await fut
asyncio.exceptions.CancelledError

after:

data: {"id":"2f1ec7557c9f4cc0b9befa8405154834","object":"chat.completion.chunk","created":1761033725,"model":"glm46","choices":[{"index":0,"delta":{"role":"assistant","content":"","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}

data: {"id":"2f1ec7557c9f4cc0b9befa8405154834","object":"chat.completion.chunk","created":1761033725,"model":"glm46","choices":[{"index":0,"delta":{"role":null,"content":null,"reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":"abort","matched_stop":null}],"usage":null}

data: [DONE]

How I found and reproduce

I run with p-d disag, set SGLANG_DISAGGREGATION_WAITING_TIMEOUT=1, decode will raise KV Transfer error.
tokenizer_manager raise a HTTPException in stream and crash then.

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @jinmingyi1998, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue where HTTP exceptions were prematurely raised during streaming operations, leading to service interruptions. It introduces a more robust exception handling mechanism for streaming requests, ensuring a graceful termination. Additionally, the PR significantly enhances the system's observability by adding detailed performance metrics across the request lifecycle and integrating a structured Logstash-compatible logging formatter. These changes aim to improve system stability, provide deeper insights into performance bottlenecks, and streamline log analysis.

Highlights

  • HTTP Exception Handling in Streaming: The primary fix addresses an issue where HTTPException or ValueError would be raised during streaming requests, causing the service to crash. Instead of raising, the system now yields the current output and breaks the loop, allowing clients to receive a [DONE] signal gracefully.
  • Enhanced Metrics and Logging: New time statistics have been added across various stages of request processing (prefill, decode, queueing, transfer, first token time) to improve performance monitoring. A new LogstashFormatter has been introduced for structured JSON logging, enabling better observability and analysis of logs, especially in disaggregated environments.
  • Grammar Backend Refinement: The _init_value_dispatch method in the reasoner grammar backend has been updated to correctly handle and propagate INVALID_GRAMMAR_OBJ, ensuring the scheduler can properly detect and manage invalid grammar states.
  • Model Configuration Adjustment: The allow_reduce_scatter parameter for the MoE layer in the glm4_moe model has been changed from True to False, which might impact how model parallelism is handled.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@jimmy-evo jimmy-evo force-pushed the fix/stream_raise_exception branch from 16e62e4 to 3845619 Compare October 21, 2025 08:15
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily addresses a crash that occurs when an exception is raised during an HTTP stream. The fix correctly handles this by breaking the loop and sending an abort message to the client for streaming requests, which is a solid improvement. A large portion of the changes also introduces extensive performance metric collection and structured logging capabilities, likely for an internal tool called "PPinfer". While these additions are valuable for observability, I've identified a few minor issues: a duplicated metric logging call, a repeated attribute in a newly added utility file, and an opportunity to improve exception re-raising for better debuggability. Overall, the pull request is in good shape, and the core fix is sound.

@jimmy-evo
Copy link
Contributor Author

/gemini review

@jimmy-evo
Copy link
Contributor Author

/gemini summary

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses an issue where an HTTPException would crash the tokenizer_manager during a streaming request. The change introduces a check for obj.stream and avoids raising an exception for streaming requests, instead breaking the loop to allow a graceful finish. The logic is sound. I've added one suggestion to refactor the code for better maintainability by reducing duplication.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

This pull request addresses a critical issue where HTTP exceptions were prematurely raised during streaming operations within the tokenizer_manager, leading to server crashes. The changes introduce a more robust error handling mechanism for streaming requests, ensuring that when an exception or abort condition is encountered, the stream gracefully terminates by yielding the current output and breaking the loop, rather than raising an unhandled exception. This allows clients to receive a proper [DONE] signal, improving the stability and reliability of streaming services.

Highlights

  • Graceful Stream Termination: Modified the tokenizer_manager to prevent server crashes during streaming requests when an exception (like ValueError or HTTPException) occurs. Instead of raising the exception, the stream now gracefully yields the current output and breaks, allowing the client to receive a [DONE] signal.
  • Improved Error Handling for Streaming: Specifically, for HTTPStatus.BAD_REQUEST and other abort conditions, the system now checks if the request is a stream. If it is, it avoids raising a critical error and instead ensures a clean stream termination.
Changelog
  • python/sglang/srt/managers/tokenizer_manager.py
    • Modified the _wait_one_response asynchronous generator to conditionally handle exceptions. When a finish_reason indicates an abort or HTTPStatus.BAD_REQUEST and the request obj.stream is true, the function now yields the current out and breaks the loop, preventing a ValueError or fastapi.HTTPException from being raised. This ensures graceful termination of streaming requests upon error.
Activity
  • gemini-code-assist[bot] provided an initial summary of changes.
  • jinmingyi1998 requested a code review from gemini-code-assist[bot].
  • jinmingyi1998 requested a summary from gemini-code-assist[bot].

@hnyls2002
Copy link
Collaborator

@Simon-Li @CatherineSue could you please take a look?

@hnyls2002 hnyls2002 merged commit 88a2a34 into sgl-project:main Nov 6, 2025
40 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants