Refactor / Unify event loop across PD-Disagg, Overlap, DP-Attn cases#12839
Refactor / Unify event loop across PD-Disagg, Overlap, DP-Attn cases#12839
Conversation
Summary of ChangesHello @hnyls2002, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on enhancing the maintainability and efficiency of the SGLang runtime's scheduling mechanisms. By unifying event loop patterns and centralizing key operational checks, the changes aim to create a more robust and easier-to-understand system for managing different batch processing scenarios, particularly in disaggregated and parallel environments. The introduction of a specialized forward mode for decode operations further refines how the system handles KV cache readiness. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the event loops for prefill-decode disaggregation, overlap scheduling, and data-parallel attention to unify their logic. This is a significant and positive change that improves code clarity and maintainability. The introduction of a new FAKE_EXTEND forward mode and centralizing the handling of different batch types in Scheduler.run_batch is a good approach. The cleanup of require_mlp_sync calls is also a welcome improvement. I have found one potential issue regarding missing output streaming for the new FAKE_EXTEND mode, which could lead to the first token being dropped in disaggregated mode.
8da56bb to
869bb47
Compare
|
All comments are resolved in #12948 |
This PR replaces #9618.
Modifications
prebuilt_extendforward mode, which is used in the decode event loop in pd-disaggregation mode.prepare_mlp_synclogic, place this function insideget_next_disagg_decode_batch_to_run.The all-gather logic for decode batch, empty batch, and prebuilt batch is as follows
Decode (dp0) + Empty (dp1) -> Docode (dp0) + Idle (dp1)Decode (dp0) + Prebuilt (dp1) -> Decode (dp0) + Idle (dp1, but returns prebuilt batch)Prebuilt (dp0) + Empty (dp1) -> No forward (but prebuilt batch returns)Decode (dp0) + Empty (dp1) + Prebuilt (dp2) -> Decode (dp0) + Idle (dp1) + Idle (dp2, but returns prebuilt)