Skip to content

psarno/chatterbox

 
 

Repository files navigation

psarno/chatterbox

Fork of resemble-ai/chatterbox with PyTorch nightly compatibility fixes for Blackwell GPUs (RTX 5080/5090).

What's changed

Chatterbox was developed against PyTorch 2.6. PyTorch nightly (2.11+), required for CUDA 13.0 support on Blackwell, tightened dtype enforcement — mixed float32/float64 tensor operations that previously passed silently now raise:

RuntimeError: expected scalar type Float but found Double

Two fixes applied directly to the source:

1. s3tokenizer/s3tokenizer.py — audio cast before torch.stft

torch.stft output dtype follows its input. The audio waveform was arriving as float64 (from numpy/librosa), while the mel filter bank buffer _mel_filters is float32. The matmul failed.

Fix: cast audio to float32 immediately after moving to device, before the stft.

2. voice_encoder/melspec.py — unconditional float32 output

scipy/librosa operations produce float64 numpy arrays. The original code only cast to float32 when normalized_mels=True — leaving float64 arrays flowing into torch.as_tensor() and on to the LSTM when normalized_mels=False.

Fix: always return float32 from melspectrogram().

Who needs this

  • RTX 5080 / RTX 5090 (Blackwell) users — CUDA 13.0 requires torch nightly, which triggers this
  • Anyone running PyTorch 2.11+ regardless of GPU

PyTorch nightly setup (uv)

Blackwell requires the cu130 torch nightly index. In pyproject.toml:

[[tool.uv.index]]
name = "pytorch-nightly-cu130"
url = "https://download.pytorch.org/whl/nightly/cu130"
explicit = true

[tool.uv.sources]
torch = { index = "pytorch-nightly-cu130" }
torchaudio = { index = "pytorch-nightly-cu130" }

[tool.uv.override-dependencies]
torch = ["torch>=2.11.0"]
torchaudio = ["torchaudio>=2.11.0"]

Note: torch-backend = "cu130" alone does not work — it resolves but installs CPU wheels on sync. Use the explicit index approach above.

Install this fork:

[tool.uv.sources]
chatterbox-tts = { git = "https://github.com/psarno/chatterbox", branch = "main" }

Or editable from a local clone:

[tool.uv.sources]
chatterbox-tts = { path = "../chatterbox", editable = true }

Upstream

All original model code, weights, and MIT license are from Resemble AI. See the upstream repo for full documentation, usage examples, and model details.

About

Chatterbox TTS fork with float32/float64 fixes for PyTorch nightly + Blackwell (RTX 5080/5090)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%