[SUPPORT] sglang POD starts with error


## Question
The "llm engine" POD keeps failing, wont start.

## What did you try?
Used the samples to set up clusterbasemodel, clusterservingruntime and inferenceservice for gemma-3-12b and sglang/srt (but also tried other LLMs).
The POD that should host the LLM always complains that there ist no architectures array and wont start.
The LLM gets downloaded nicely after creating a hf-token secret.
I tried with sglang 0.55 and 0.59, but the sglang runtime pod just logs this message over and over again:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 32, in <module>
    server_args = prepare_server_args(sys.argv[1:])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 5592, in prepare_server_args
    return ServerArgs.from_cli_args(raw_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 5078, in from_cli_args
    return cls(**{attr: getattr(args, attr) for attr in attrs})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 331, in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 733, in __post_init__
    self._handle_gpu_memory_settings(gpu_mem)
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 1010, in _handle_gpu_memory_settings
    if not self.use_mla_backend():
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 5111, in use_mla_backend
    model_config = self.get_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 5092, in get_model_config
    self.model_config = ModelConfig.from_server_args(self)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 250, in from_server_args
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 149, in __init__
    if self.hf_config.architectures[0] in mm_disabled_models:
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable



I presume that the ome operator somehow does not hand over the LLM architecture to the POD ?
It should be "Gemma3ForConditionalGeneration" , but its somehow empty due to the error log.
Or is it permission problems on the worker node storage ?


## Environment

- OME version:
0.1.4 
- Kubernetes version:
1.33.1
- Runtime being used (SGLang etc.):
SGlang
- Model being served (if applicable):
gemma-3-12b , but others fails too

## Additional context


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] sglang POD starts with error #546

Question

What did you try?

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SUPPORT] sglang POD starts with error #546

Description

Question

What did you try?

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions