-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Subject: Deployment Summary: Attempting Auralis on RTX 5070 Ti (Blackwell) with vLLM 0.15.1
Context:
I am currently working on deploying Auralis with a fine-tuned XTTSv2 model on an NVIDIA RTX 5070 Ti (Blackwell architecture). Since Blackwell requires very recent CUDA/vLLM versions, I’ve been navigating several compatibility hurdles.
What has been done so far:
vLLM 0.15.1 Upgrade: Moved to vLLM 0.15.1 to leverage native Blackwell (sm_120) support and modern CUDA kernels.
Runtime-Bridge Attempts: Tried monkeypatching legacy vLLM paths (e.g., vllm.inputs.registry) at runtime to satisfy Auralis’s dependencies. This failed due to deep structural changes in vLLM's V1 engine preparation.
Source-Level Patching (Current Phase): Implementing a dedicated Python patch script to hard-fix imports directly in the Auralis source code during the Docker build process.
Files Modified:
Dockerfile.auralis: Updated for vLLM 0.15.1 + added the custom patch script.
oai_server.py: Added Blackwell-specific hardware flags.
XTTSv2.py: Modified to enforce Eager Mode and disable Async Output.
docker_compose.yml: Integrated runtime flags (--enforce-eager, --disable-async-output-proc).
patch_auralis_for_vllm_0_15_1.py: NEW – Orchestrates the remapping of vLLM imports (InputContext, SamplingMetadata, etc.).
Current Status & Challenges:
The server is still hitting ImportErrors despite the patching. Specifically, the re-organization of vllm.model_executor and vllm.inputs in 0.15.1 is causing a friction point with Auralis's current architecture.
Proposed Paths Forward:
Option A (Pragmatic): Revert to vLLM 0.7.2 and bypass the sm_120 NotImplementedError via architecture spoofing (treating Blackwell as Ada/Hopper). This might be the fastest path to a stable server.
Option B (Enthusiast): Commit to vLLM 0.15.1 and finish the comprehensive import-mapping. This would yield the best Blackwell-native performance (FP4 support, etc.) but requires more "surgical" code changes in Auralis.
I've also documented the specific ModuleNotFoundError and ImportError traces when moving from vLLM 0.7.2 to 0.15.1, which highlight the structural shifts in model_executor and multimodal.inputs.
I am more than happy to contribute my patch scripts and findings as a Pull Request if there is interest in bringing official Blackwell support to Auralis. Let me know if you want to collaborate on making the framework 50-series ready!