Fork of resemble-ai/chatterbox with PyTorch nightly compatibility fixes for Blackwell GPUs (RTX 5080/5090).
Chatterbox was developed against PyTorch 2.6. PyTorch nightly (2.11+), required for CUDA 13.0 support on Blackwell, tightened dtype enforcement — mixed float32/float64 tensor operations that previously passed silently now raise:
RuntimeError: expected scalar type Float but found Double
Two fixes applied directly to the source:
torch.stft output dtype follows its input. The audio waveform was arriving as float64 (from numpy/librosa), while the mel filter bank buffer _mel_filters is float32. The matmul failed.
Fix: cast audio to float32 immediately after moving to device, before the stft.
scipy/librosa operations produce float64 numpy arrays. The original code only cast to float32 when normalized_mels=True — leaving float64 arrays flowing into torch.as_tensor() and on to the LSTM when normalized_mels=False.
Fix: always return float32 from melspectrogram().
- RTX 5080 / RTX 5090 (Blackwell) users — CUDA 13.0 requires torch nightly, which triggers this
- Anyone running PyTorch 2.11+ regardless of GPU
Blackwell requires the cu130 torch nightly index. In pyproject.toml:
[[tool.uv.index]]
name = "pytorch-nightly-cu130"
url = "https://download.pytorch.org/whl/nightly/cu130"
explicit = true
[tool.uv.sources]
torch = { index = "pytorch-nightly-cu130" }
torchaudio = { index = "pytorch-nightly-cu130" }
[tool.uv.override-dependencies]
torch = ["torch>=2.11.0"]
torchaudio = ["torchaudio>=2.11.0"]Note:
torch-backend = "cu130"alone does not work — it resolves but installs CPU wheels on sync. Use the explicit index approach above.
Install this fork:
[tool.uv.sources]
chatterbox-tts = { git = "https://github.com/psarno/chatterbox", branch = "main" }Or editable from a local clone:
[tool.uv.sources]
chatterbox-tts = { path = "../chatterbox", editable = true }All original model code, weights, and MIT license are from Resemble AI. See the upstream repo for full documentation, usage examples, and model details.