-
Notifications
You must be signed in to change notification settings - Fork 96
Hybrid models with more than one KV cache type are not supported yet #215
Copy link
Copy link
Open
Description
Received this error when trying to spin up 2 engines/GPU for google/gemma-3-4b-it on A100. Is there any solution to this? @ivanium @cui36 @jiarong0907
(EngineCore_DP0 pid=6302) WARNING 11-13 11:03:06 [kv_cache_utils.py:982] Add 1 padding layers, may waste at most 3.45% KV cache memory
(EngineCore_DP0 pid=6302) INFO 11-13 11:03:06 [kv_cache_utils.py:1087] GPU KV cache size: 317,136 tokens
(EngineCore_DP0 pid=6302) INFO 11-13 11:03:06 [kv_cache_utils.py:1091] Maximum concurrency for 3,072 tokens per request: 102.78x
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/tmp/kvcached/kvcached/integration/vllm/patches.py", line 167, in _patched_engine_init
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] return original_init(self, vllm_config, *args, **kwargs)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 73, in initialize_from_config
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] self.collective_rpc("initialize_from_config",
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 254, in initialize_from_config
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 318, in initialize_from_config
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] self.model_runner.initialize_kv_cache(kv_cache_config)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3987, in initialize_kv_cache
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] kv_caches = self.initialize_kv_cache_tensors(kv_cache_config)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3926, in initialize_kv_cache_tensors
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] kv_cache_raw_tensors = self._allocate_kv_cache_tensors(kv_cache_config)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/tmp/kvcached/kvcached/integration/vllm/patches.py", line 627, in _patched_alloc_kv
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] return self._allocate_kv_cache_from_kvcached(kv_cache_config)
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] File "/tmp/kvcached/kvcached/integration/vllm/patches.py", line 556, in _allocate_kv_cache_from_kvcached
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] raise NotImplementedError(
(EngineCore_DP0 pid=6302) ERROR 11-13 11:03:06 [core.py:708] NotImplementedError: Hybrid models with more than one KV cache type are not supported yet.Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels