Skip to content

Remove sampling info events and overlap thread file#11300

Merged
hnyls2002 merged 3 commits intomainfrom
lsyin/cleanup-thread
Oct 7, 2025
Merged

Remove sampling info events and overlap thread file#11300
hnyls2002 merged 3 commits intomainfrom
lsyin/cleanup-thread

Conversation

@hnyls2002
Copy link
Collaborator

@hnyls2002 hnyls2002 commented Oct 7, 2025

Depends on #11210


  • Removed the unused file tp_worker_overlap_thread.py
  • Removed the unused events related to sampling info synchronization.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hnyls2002, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on a significant cleanup of the scheduling and sampling infrastructure. It removes obsolete threading mechanisms, specifically the tp_worker_overlap_thread, and streamlines how sampling information is managed and synchronized. By eliminating unused events, dummy batch concepts, and associated code, the changes aim to reduce complexity, improve code maintainability, and remove unnecessary overhead within the system's core operations.

Highlights

  • File Removal: The tp_worker_overlap_thread.py file, which contained a tensor parallel worker client, has been completely removed as it is no longer in use.
  • Sampling Info Synchronization Cleanup: All unused events and related logic for sampling information synchronization, including the sampling_info_done event and the next_batch_sampling_info attribute, have been eliminated from various scheduler and batch management components.
  • Removal of Dummy Batch Concept: The concept of a DUMMY_FIRST batch, previously used to initiate the overlap schedule pipeline and trigger sampling info events, has been removed from ForwardMode and associated code.
  • Code Simplification: The _preprocess_logits method in model_runner.py has been simplified by removing conditional logic related to sampling_info_done, streamlining the logit bias application process.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@hnyls2002 hnyls2002 changed the title Lsyin/cleanup thread Remove sampling info events and overlap thread file Oct 7, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a nice cleanup by removing the unused tp_worker_overlap_thread.py file and the associated logic for sampling info synchronization, including the sampling_info_done event. The changes are consistent across multiple files, simplifying the codebase.

I have one comment regarding a potential performance regression in model_runner.py due to the removal of an overlap optimization. Please take a look. Otherwise, the changes look good.

Base automatically changed from lsyin/remove-overlap-thread to main October 7, 2025 12:12
@hnyls2002 hnyls2002 requested a review from kssteven418 as a code owner October 7, 2025 12:12
@hnyls2002 hnyls2002 force-pushed the lsyin/cleanup-thread branch from bcff306 to 9ae2ef7 Compare October 7, 2025 12:19
@hnyls2002 hnyls2002 mentioned this pull request Oct 7, 2025
4 tasks
@hnyls2002 hnyls2002 merged commit 501dfa6 into main Oct 7, 2025
97 of 100 checks passed
@hnyls2002 hnyls2002 deleted the lsyin/cleanup-thread branch October 7, 2025 13:34
ch-tiger1 pushed a commit to ch-tiger1/sglang that referenced this pull request Oct 9, 2025
lpc0220 pushed a commit to lpc0220/sglang that referenced this pull request Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments