Skip to content

Revert "lang: support direct video inference"#12038

Merged
merrymercy merged 1 commit intomainfrom
revert-9936-lang-video
Oct 24, 2025
Merged

Revert "lang: support direct video inference"#12038
merrymercy merged 1 commit intomainfrom
revert-9936-lang-video

Conversation

@merrymercy
Copy link
Contributor

Reverts #9936

because the lint failed

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @merrymercy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the existing video inference support by transitioning from a dedicated video processing pipeline to one that encodes videos into base64 and treats them as sequences of images. This change involves introducing a new video encoding utility, removing specific video handling logic from the backend and chat templates, and integrating video processing into the existing image handling framework within the interpreter. The video API signature has also been updated to enforce the num_frames argument. These modifications aim to streamline multimodal input processing, potentially addressing previous issues like linting failures.

Highlights

  • Refactored Video Inference: The approach to video inference has been refactored. Instead of dedicated video handling, videos are now encoded into base64 and processed as sequences of images.
  • Introduction of encode_video_base64: A new utility function encode_video_base64 has been added to sglang.utils to convert video files into base64-encoded frames.
  • Removal of Dedicated Video Processing: Dedicated video processing logic, including _add_videos methods and video_token definitions, has been removed from the runtime backend and chat template.
  • Integration with Image Processing: Video handling in the interpreter now leverages existing image processing mechanisms, appending encoded video frames to the images_ list and using the image_token.
  • Updated Video API Signature: The video function in sglang.lang.api and sglang.lang.ir now requires num_frames as a mandatory argument, removing its default value.
  • Documentation and Benchmark Updates: The frontend tutorial notebook has been updated to remove explicit video QA examples, and the hicache benchmark now uses the new video encoding approach.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@merrymercy merrymercy merged commit ffc722a into main Oct 24, 2025
7 of 46 checks passed
@merrymercy merrymercy deleted the revert-9936-lang-video branch October 24, 2025 02:21
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts the changes from a previous commit. The revert appears to be incomplete, leading to a broken state. Specifically, the _execute_video method in python/sglang/lang/interpreter.py is restored to a version that depends on self.videos_ and self.chat_template.video_token. However, these attributes are removed in other parts of this same PR, which will cause runtime AttributeErrors. This critical inconsistency needs to be addressed to ensure the codebase is functional after this revert.

Comment on lines +512 to +513

base64_data = encode_video_base64(path, num_frames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This restored implementation of _execute_video will cause a runtime error. It relies on self.videos_ and self.chat_template.video_token, but the definitions for these have been removed in this same pull request:

  1. The videos_ attribute is removed from the StreamExecutor class in this same file (in __init__ and fork).
  2. The video_token attribute is removed from the ChatTemplate class in python/sglang/lang/chat_template.py.

As a result, executing this code will raise an AttributeError. Please correct the revert to ensure the code is consistent. It seems this function should either be removed entirely if the video feature is being fully reverted, or the dependencies (videos_ and video_token) should be restored as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments