Skip to content

[SUPPORT] sglang POD starts with error #546

@ilfur

Description

@ilfur

Question

The "llm engine" POD keeps failing, wont start.

What did you try?

Used the samples to set up clusterbasemodel, clusterservingruntime and inferenceservice for gemma-3-12b and sglang/srt (but also tried other LLMs).
The POD that should host the LLM always complains that there ist no architectures array and wont start.
The LLM gets downloaded nicely after creating a hf-token secret.
I tried with sglang 0.55 and 0.59, but the sglang runtime pod just logs this message over and over again:

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 32, in
server_args = prepare_server_args(sys.argv[1:])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 5592, in prepare_server_args
return ServerArgs.from_cli_args(raw_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 5078, in from_cli_args
return cls(**{attr: getattr(args, attr) for attr in attrs})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 331, in init
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 733, in post_init
self._handle_gpu_memory_settings(gpu_mem)
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 1010, in _handle_gpu_memory_settings
if not self.use_mla_backend():
^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 5111, in use_mla_backend
model_config = self.get_model_config()
^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 5092, in get_model_config
self.model_config = ModelConfig.from_server_args(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 250, in from_server_args
return ModelConfig(
^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 149, in init
if self.hf_config.architectures[0] in mm_disabled_models:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable

I presume that the ome operator somehow does not hand over the LLM architecture to the POD ?
It should be "Gemma3ForConditionalGeneration" , but its somehow empty due to the error log.
Or is it permission problems on the worker node storage ?

Environment

  • OME version:
    0.1.4
  • Kubernetes version:
    1.33.1
  • Runtime being used (SGLang etc.):
    SGlang
  • Model being served (if applicable):
    gemma-3-12b , but others fails too

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions