Skip to content

Hybrid models with more than one KV cache type are not supported yet #215

@alecngo

Description

@alecngo

Received this error when trying to spin up 2 engines/GPU for google/gemma-3-4b-it on A100. Is there any solution to this? @ivanium @cui36 @jiarong0907

(EngineCore_DP0 pid=6302) WARNING 11-13 11:03:06 [kv_cache_utils.py:982] Add 1 padding layers, may waste at most 3.45% KV cache memory
(EngineCore_DP0 pid=6302) INFO 11-13 11:03:06 [kv_cache_utils.py:1087] GPU KV cache size: 317,136 tokens
(EngineCore_DP0 pid=6302) INFO 11-13 11:03:06 [kv_cache_utils.py:1091] Maximum concurrency for 3,072 tokens per request: 102.78x
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/tmp/kvcached/kvcached/integration/vllm/patches.py", line 167, in _patched_engine_init
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     return original_init(self, vllm_config, *args, **kwargs)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 73, in initialize_from_config
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     self.collective_rpc("initialize_from_config",
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/opt/venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/opt/venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     return func(*args, **kwargs)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/opt/venv/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 254, in initialize_from_config
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     self.worker.initialize_from_config(kv_cache_config)  # type: ignore
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 318, in initialize_from_config
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     self.model_runner.initialize_kv_cache(kv_cache_config)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3987, in initialize_kv_cache
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     kv_caches = self.initialize_kv_cache_tensors(kv_cache_config)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3926, in initialize_kv_cache_tensors
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     kv_cache_raw_tensors = self._allocate_kv_cache_tensors(kv_cache_config)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/tmp/kvcached/kvcached/integration/vllm/patches.py", line 627, in _patched_alloc_kv
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     return self._allocate_kv_cache_from_kvcached(kv_cache_config)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]   File "/tmp/kvcached/kvcached/integration/vllm/patches.py", line 556, in _allocate_kv_cache_from_kvcached
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708]     raise NotImplementedError(
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] NotImplementedError: Hybrid models with more than one KV cache type are not supported yet.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions