[MultiModal][Feat] multimodal develop - support Wan2.1 by pengchengneo · Pull Request #593 · sgl-project/sglang-jax

pengchengneo · 2025-12-23T10:17:07Z

Basic MultiModal Features Roadmap

User Interface Refactor
- HTTP Requests/Tokenizer/Detokenizer (Contract common abstract class for multi schema request)
Launch Server
- Abstract Class Definition, basic component develop
- WeightLoader Util refactor, make it compatible for various multimodal models @andy1126

Wan2.1 Support Work Break Down

Diffusion Engine Develop (under multimodal folder) @SiqiLi-Fighting
- Support Naive Diffusion Engine without any optimized features
Vae Stage Develop (under multimodal folder) @pathfinder-pf
T5 Stage Develop (under autoregressive/text folder) @SII-limingliu
- refactor AR stage's some interface to fit multimodal
GlobalScheduler and communication within stages
Unit test / e2e test / add to CI
model evaluation

Currently, We support two models:
a. Wan-AI/Wan2.1-T2V-1.3B-Diffusers
b. Wan-AI/Wan2.1-T2V-14B-Diffusers

Model Evaluation Results

Test Command

Environment:tpu-v6e-4

uv run  python3 -u -m sgl_jax.launch_server --multimodal --model-path=Wan-AI/Wan2.1-T2V-14B-Diffusers  --log-requests

uv run  python3 -u -m sgl_jax.launch_server --multimodal --model-path=Wan-AI/Wan2.1-T2V-1.3B-Diffusers  --log-requests

1.3B/14B Image

curl http://localhost:30000/api/v1/images/generation -H "Content-Type: application/json" -d '{"prompt": "A curious raccoon", "size": "480*832"}'

1.3B Video

curl http://localhost:30000/api/v1/videos/generation -H "Content-Type: application/json" -d '{"prompt": "A curious raccoon", "size": "480*832", "num_frames": 41}'

14B Video (this model still need optimization to support large num_frames video)

curl http://localhost:30000/api/v1/videos/generation -H "Content-Type: application/json" -d '{"prompt": "A curious raccoon", "size": "480*832", "num_frames": 5}'

Test Result

Model Name	Generated Image (Preview)	Generated Video
Wan-AI/Wan2.1-T2V-1.3B-Diffusers		▶️ Click to Watch Video
Wan-AI/Wan2.1-T2V-14B-Diffusers		▶️ Click to Watch Video

gemini-code-assist · 2025-12-23T10:17:40Z

Summary of Changes

Hello @SiqiLi-Fighting, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for a robust multimodal inference framework within sgl-jax, enabling efficient processing of complex models like Wan2.1. The core idea is to create a flexible, high-performance system that can seamlessly integrate various computational stages, such as autoregressive decoding and diffusion denoising, by adopting a thread-based, single-process architecture. This change introduces new server configurations, API endpoints for image and video generation, and a modular stage-based execution pipeline, significantly expanding the platform's capabilities beyond text-only models.

Highlights

Multimodal Framework Introduction: Implemented a new, unified, high-performance inference framework for next-generation multimodal models (e.g., Wan2.1, Qwen2.5VL, MiMo-Audio, Qwen-Omni, Ling-Omni).
Modular Architecture: Designed with an "Operating System" philosophy, separating the control plane (Global Scheduler) from the computation plane (Device Stages) to support heterogeneous compute patterns like AR decoding and Diffusion denoising.
Thread-Based SPMD Concurrency: Utilizes a Single Process, Multiple Data (SPMD) logic with multi-threading to minimize inter-process communication overhead and maximize parallel throughput.
Multimodal Server Arguments: Introduced MultimodalServerArgs to configure multimodal-specific parameters, including precision settings for DiT, VAE, and various encoders.
New API Endpoints: Added HTTP API routes for image and video generation (/api/v1/images/generation, /api/v1/videos/generation).
Stage-Based Execution: Implemented a Stage abstraction, DeviceManager, GlobalScheduler, and specialized schedulers (DiffusionScheduler, VaeScheduler) to manage and execute different model components.
Multimodal Tokenization/Detokenization: Introduced dedicated MultimodalTokenizer and MultimodalDetokenizer components to handle complex multimodal input/output schemas.
Design Documentation: Included a detailed design document ([RFC]multimodal_architechure.md) outlining the framework's principles and components.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

python/sgl_jax/srt/multimodal/manager/schedule_batch.py

python/sgl_jax/srt/configs/load_config.py

python/sgl_jax/srt/layers/attention/flashattention_backend.py

python/sgl_jax/srt/models/llama_eagle3.py

Co-authored-by: pathfinder-fp <slackexplorer@gmail.com>

sii-xinglong

LGTM

pengchengneo marked this pull request as draft December 23, 2025 10:18

pengchengneo changed the title ~~[Feat] multimodal develop - support Wan2.1~~ [WIP][Feat] multimodal develop - support Wan2.1 Dec 23, 2025

JamesBrianD reviewed Dec 29, 2025

View reviewed changes

python/sgl_jax/srt/multimodal/manager/schedule_batch.py Outdated Show resolved Hide resolved

pengchengneo linked an issue Dec 30, 2025 that may be closed by this pull request

[RFC]: Support multimodal model in SGL_JAX #552

Closed

13 tasks

pengchengneo requested review from Iamleos, Prayer3th, aolemila and jimoosciuc January 14, 2026 11:48

pengchengneo force-pushed the epic/multimodal-support branch from a775359 to 14aa623 Compare January 14, 2026 11:54

pengchengneo marked this pull request as ready for review January 14, 2026 11:54

pengchengneo force-pushed the epic/multimodal-support branch from 14aa623 to 4d1b1c0 Compare January 14, 2026 11:55

pengchengneo added MultiModal new-model labels Jan 14, 2026

pengchengneo changed the title ~~[WIP][Feat] multimodal develop - support Wan2.1~~ [MultiModal][Feat] multimodal develop - support Wan2.1 Jan 14, 2026

pengchengneo requested a review from sii-xinglong January 14, 2026 12:47

pathfinder-pf reviewed Jan 15, 2026

View reviewed changes

python/sgl_jax/srt/configs/load_config.py Show resolved Hide resolved

pathfinder-pf reviewed Jan 15, 2026

View reviewed changes

python/sgl_jax/srt/layers/attention/flashattention_backend.py Show resolved Hide resolved

pathfinder-pf reviewed Jan 15, 2026

View reviewed changes

python/sgl_jax/srt/models/llama_eagle3.py Show resolved Hide resolved

pengchengneo assigned pengchengneo, pathfinder-pf and SII-limingliu Jan 15, 2026

pengchengneo force-pushed the epic/multimodal-support branch from 0f6a390 to 927c189 Compare January 15, 2026 08:21

pengchengneo requested a review from zkkython January 15, 2026 10:14

pengchengneo linked an issue Jan 16, 2026 that may be closed by this pull request

[Feature] Multi Modal Models Support #476

Closed

support wan2.1

a13da64

Co-authored-by: pathfinder-fp <slackexplorer@gmail.com>

pengchengneo force-pushed the epic/multimodal-support branch from 177674f to a13da64 Compare January 19, 2026 04:21

add none for return experts

85974ef

sii-xinglong approved these changes Jan 19, 2026

View reviewed changes

pengchengneo merged commit 456a113 into sgl-project:main Jan 19, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[MultiModal][Feat] multimodal develop - support Wan2.1#593

[MultiModal][Feat] multimodal develop - support Wan2.1#593
pengchengneo merged 2 commits intosgl-project:mainfrom
primatrix:epic/multimodal-support

pengchengneo commented Dec 23, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sii-xinglong left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

pengchengneo commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Basic MultiModal Features Roadmap

Wan2.1 Support Work Break Down

Model Evaluation Results

Test Command

Test Result

Uh oh!

gemini-code-assist bot commented Dec 23, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sii-xinglong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pengchengneo commented Dec 23, 2025 •

edited

Loading