-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Closed
Labels
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
Even though sgl_kernel is installed, I am getting an error saying ValueError: No processor registered for architecture: ['Qwen2_5_VLForConditionalGeneration']. and [2025-06-03 12:29:18] Ignore import error when loading sglang.srt.managers.multimodal_processors.qwen_vl: Can not import sgl_kernel. Please check your installation.
Following is the full error:
/opt/venv/lib/python3.12/site-packages/transformers/utils/hub.py:111: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
server_args=ServerArgs(model_path='Qwen/Qwen2.5-VL-3B-Instruct', tokenizer_path='Qwen/Qwen2.5-VL-3B-Instruct', tokenizer_mode='auto', skip_tokenizer_init=False, enable_tokenizer_batch_encode=False, load_format='auto', trust_remote_code=False, dtype='auto', kv_cache_dtype='auto', quantization=None, quantization_param_path=None, context_length=None, device='cuda', served_model_name='Qwen/Qwen2.5-VL-3B-Instruct', chat_template=None, completion_template=None, is_embedding=False, revision=None, host='127.0.0.1', port=30000, mem_fraction_static=0.5, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='fcfs', schedule_conservativeness=1.0, cpu_offload_gb=0, page_size=1, tp_size=1, pp_size=1, max_micro_batch_size=None, stream_interval=1, stream_output=False, random_seed=1009269236, constrained_json_whitespace_pattern=None, watchdog_timeout=300, dist_timeout=None, download_dir=None, base_gpu_id=0, gpu_id_step=1, log_level='info', log_level_http=None, log_requests=False, log_requests_level=0, show_time_cost=False, enable_metrics=False, decode_log_interval=40, enable_request_time_stats_logging=False, api_key=None, file_storage_path='sglang_storage', enable_cache_report=False, reasoning_parser=None, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend=None, sampling_backend='flashinfer', grammar_backend='xgrammar', speculative_algorithm=None, speculative_draft_model_path=None, speculative_num_steps=None, speculative_eagle_topk=None, speculative_num_draft_tokens=None, speculative_accept_threshold_single=1.0, speculative_accept_threshold_acc=1.0, speculative_token_map=None, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, enable_nccl_nvls=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, enable_multimodal=None, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_dp_lm_head=False, enable_ep_moe=False, enable_deepep_moe=False, deepep_mode='auto', enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=None, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False, hicache_ratio=2.0, hicache_size=0, hicache_write_policy='write_through_selective', flashinfer_mla_disable_ragged=False, warmups=None, moe_dense_tp_size=None, n_share_experts_fusion=0, disable_chunked_prefix_cache=False, disable_fast_image_processor=False, mm_attention_backend=None, debug_tensor_dump_output_folder=None, debug_tensor_dump_input_file=None, debug_tensor_dump_inject=False, disaggregation_mode='null', disaggregation_bootstrap_port=8998, disaggregation_transfer_backend='mooncake', disaggregation_ib_device=None, pdlb_url=None)
[2025-06-03 12:29:17] Ignore import error when loading sglang.srt.managers.multimodal_processors.clip: Can not import sgl_kernel. Please check your installation.
[2025-06-03 12:29:18] Ignore import error when loading sglang.srt.managers.multimodal_processors.internvl: No module named 'numpy.distutils'
[2025-06-03 12:29:18] Ignore import error when loading sglang.srt.managers.multimodal_processors.janus_pro: Can not import sgl_kernel. Please check your installation.
[2025-06-03 12:29:18] Ignore import error when loading sglang.srt.managers.multimodal_processors.minicpm: Can not import sgl_kernel. Please check your installation.
[2025-06-03 12:29:18] Ignore import error when loading sglang.srt.managers.multimodal_processors.mlama: Can not import sgl_kernel. Please check your installation.
[2025-06-03 12:29:18] Ignore import error when loading sglang.srt.managers.multimodal_processors.pixtral: Can not import sgl_kernel. Please check your installation.
[2025-06-03 12:29:18] Ignore import error when loading sglang.srt.managers.multimodal_processors.qwen_vl: Can not import sgl_kernel. Please check your installation.
/opt/venv/lib/python3.12/site-packages/transformers/utils/hub.py:111: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
/opt/venv/lib/python3.12/site-packages/transformers/utils/hub.py:111: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/opt/venv/lib/python3.12/site-packages/sglang/bench_offline_throughput.py", line 423, in <module>
throughput_test(server_args, bench_args)
File "/opt/venv/lib/python3.12/site-packages/sglang/bench_offline_throughput.py", line 309, in throughput_test
backend = Engine(**dataclasses.asdict(server_args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/sglang/srt/entrypoints/engine.py", line 125, in __init__
tokenizer_manager, scheduler_info = _launch_subprocesses(
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/sglang/srt/entrypoints/engine.py", line 629, in _launch_subprocesses
tokenizer_manager = TokenizerManager(server_args, port_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/sglang/srt/managers/tokenizer_manager.py", line 205, in __init__
self.mm_processor = get_mm_processor(
^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/sglang/srt/managers/multimodal_processor.py", line 61, in get_mm_processor
raise ValueError(
ValueError: No processor registered for architecture: ['Qwen2_5_VLForConditionalGeneration'].
Registered architectures: ['DeepseekVL2ForCausalLM', 'Gemma3ForConditionalGeneration', 'KimiVLForConditionalGeneration', 'LlavaLlamaForCausalLM', 'LlavaVidForCausalLM', 'LlavaQwenForCausalLM', 'LlavaMistralForCausalLM', 'LlavaForConditionalGeneration', 'Llama4ForConditionalGeneration']Reproduction
I built SGLang Docker myself with the following command:
LSB_RELEASE=24.04 CUDA_VERSION=12.8 PYTHON_VERSION=3.12 PYTORCH_VERSION=2.6 jetson-containers build sglangAnd also with PYTORCH_VERSION=2.8
I am getting the same error.
Environment
I only saved Pytorch 2.7 env. Following is the env:
testing SGLang...
Python: 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0]
CUDA available: True
GPU 0: Orin
GPU 0 Compute Capability: 8.7
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.8, V12.8.93
CUDA Driver Version: 540.4.0
PyTorch: 2.7.0
sglang: 0.4.6.post4
sgl_kernel: 0.1.2.post1
flashinfer_python: 0.2.5
triton: 3.3.0
transformers: 4.52.4
torchao: 0.11.0+gitbda5305
numpy: 2.2.6
aiohttp: 3.12.7
fastapi: 0.115.12
hf_transfer: 0.1.9
huggingface_hub: 0.32.4
interegular: 0.3.3
modelscope: 1.26.0
orjson: 3.10.18
outlines: Module Not Found
packaging: 25.0
psutil: 7.0.0
pydantic: 2.11.5
python-multipart: 0.0.20
pyzmq: 26.4.0
uvicorn: 0.34.3
uvloop: 0.21.0
vllm: Module Not Found
xgrammar: 0.1.19
openai: Module Not Found
tiktoken: 0.9.0
anthropic: Module Not Found
litellm: Module Not Found
decord: 0.7.0
ulimit soft: 1048576
SGLang OKReactions are currently unavailable