Skip to content

Deployment Summary: Attempting Auralis on RTX 5070 Ti (Blackwell) with vLLM 0.15.1 #84

@widlers

Description

@widlers

Subject: Deployment Summary: Attempting Auralis on RTX 5070 Ti (Blackwell) with vLLM 0.15.1
Context:
I am currently working on deploying Auralis with a fine-tuned XTTSv2 model on an NVIDIA RTX 5070 Ti (Blackwell architecture). Since Blackwell requires very recent CUDA/vLLM versions, I’ve been navigating several compatibility hurdles.

What has been done so far:

vLLM 0.15.1 Upgrade: Moved to vLLM 0.15.1 to leverage native Blackwell (sm_120) support and modern CUDA kernels.

Runtime-Bridge Attempts: Tried monkeypatching legacy vLLM paths (e.g., vllm.inputs.registry) at runtime to satisfy Auralis’s dependencies. This failed due to deep structural changes in vLLM's V1 engine preparation.

Source-Level Patching (Current Phase): Implementing a dedicated Python patch script to hard-fix imports directly in the Auralis source code during the Docker build process.

Files Modified:

Dockerfile.auralis: Updated for vLLM 0.15.1 + added the custom patch script.

oai_server.py: Added Blackwell-specific hardware flags.

XTTSv2.py: Modified to enforce Eager Mode and disable Async Output.

docker_compose.yml: Integrated runtime flags (--enforce-eager, --disable-async-output-proc).

patch_auralis_for_vllm_0_15_1.py: NEW – Orchestrates the remapping of vLLM imports (InputContext, SamplingMetadata, etc.).

Current Status & Challenges:
The server is still hitting ImportErrors despite the patching. Specifically, the re-organization of vllm.model_executor and vllm.inputs in 0.15.1 is causing a friction point with Auralis's current architecture.

Proposed Paths Forward:

Option A (Pragmatic): Revert to vLLM 0.7.2 and bypass the sm_120 NotImplementedError via architecture spoofing (treating Blackwell as Ada/Hopper). This might be the fastest path to a stable server.

Option B (Enthusiast): Commit to vLLM 0.15.1 and finish the comprehensive import-mapping. This would yield the best Blackwell-native performance (FP4 support, etc.) but requires more "surgical" code changes in Auralis.

I've also documented the specific ModuleNotFoundError and ImportError traces when moving from vLLM 0.7.2 to 0.15.1, which highlight the structural shifts in model_executor and multimodal.inputs.

I am more than happy to contribute my patch scripts and findings as a Pull Request if there is interest in bringing official Blackwell support to Auralis. Let me know if you want to collaborate on making the framework 50-series ready!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions