Skip to content

Commit 991c63d

Browse files
authored
fix: Increase FlashInfer workspace size for Qwen3VL models (sgl-project#14173)
1 parent f607c17 commit 991c63d

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

python/sglang/srt/layers/attention/flashinfer_backend.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,10 @@ def __init__(
161161
"Qwen2ForCausalLM" in model_runner.model_config.hf_config.architectures
162162
or "Qwen3ForCausalLM" in model_runner.model_config.hf_config.architectures
163163
or "MiMoForCausalLM" in model_runner.model_config.hf_config.architectures
164+
or "Qwen3VLForConditionalGeneration"
165+
in model_runner.model_config.hf_config.architectures
166+
or "Qwen3VLMoeForConditionalGeneration"
167+
in model_runner.model_config.hf_config.architectures
164168
):
165169
envs.SGLANG_FLASHINFER_WORKSPACE_SIZE.set(512 * 1024 * 1024)
166170

0 commit comments

Comments
 (0)