I am testing Qwen3-8B using vLLM with kvcached on a 2-GPU node. When Pipeline Parallelism (PP) is enabled (e.g., PP=2, TP=1), the engine fails to initialize, raising a TimeoutError. The KVCacheManager fails to detect KV tensor creation within the 10-second threshold.
