Skip to content

[model-gateway] add llava model image processor and tests#14371

Merged
slin1237 merged 8 commits intomainfrom
smg-mm-3/n
Dec 3, 2025
Merged

[model-gateway] add llava model image processor and tests#14371
slin1237 merged 8 commits intomainfrom
smg-mm-3/n

Conversation

@slin1237
Copy link
Collaborator

@slin1237 slin1237 commented Dec 3, 2025

  1. add llava model image processor
  2. add this to image processor to registry
  3. add integration tests with generated results from python transformer

Checklist

@github-actions github-actions bot added dependencies Pull requests that update a dependency file Multi-modal multi-modal language model model-gateway labels Dec 3, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates robust support for LLaVA model image preprocessing within the model-gateway. It provides dedicated processors for different LLaVA versions and modes, integrates them into a central registry for easy access, and establishes a rigorous testing framework using golden outputs from HuggingFace to guarantee accurate and consistent image transformations. This enhancement significantly expands the multimodal capabilities of the gateway.

Highlights

  • LLaVA Image Processor: Implemented a new image processor for LLaVA models, supporting both LLaVA 1.5 (standard CLIP and pad modes) and LLaVA-NeXT (multi-crop anyres).
  • Processor Registry Integration: The new LLaVA processors are integrated into the ImageProcessorRegistry with default configurations, allowing for dynamic selection based on model names.
  • Golden Integration Tests: Added comprehensive golden tests that compare the Rust image processing outputs against reference outputs generated by HuggingFace transformers, ensuring pixel-perfect compatibility and correctness.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@slin1237 slin1237 added the run-ci label Dec 3, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds image processors for LLaVA models, including a Python script for generating golden test files and corresponding integration tests. The implementation is well-structured and the testing approach is robust. My review focuses on a few key areas for improvement. I've found a high-severity issue in the LlavaNextProcessor batch processing logic that could lead to performance degradation and stack overflows, and a correctness bug in an image centering function. Additionally, I've suggested a refactoring in the new test file to reduce code duplication and improve maintainability. After addressing these points, this will be an excellent addition.

@slin1237 slin1237 merged commit abf6272 into main Dec 3, 2025
56 checks passed
@slin1237 slin1237 deleted the smg-mm-3/n branch December 3, 2025 19:18
tom-jerr pushed a commit to tom-jerr/sglang that referenced this pull request Dec 4, 2025
yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Dec 4, 2025
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025
Kevin-XiongC pushed a commit to novitalabs/sglang that referenced this pull request Dec 9, 2025
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file model-gateway Multi-modal multi-modal language model run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments