psarno/chatterbox

Fork of resemble-ai/chatterbox with PyTorch nightly compatibility fixes for Blackwell GPUs (RTX 5080/5090).

What's changed

Chatterbox was developed against PyTorch 2.6. PyTorch nightly (2.11+), required for CUDA 13.0 support on Blackwell, tightened dtype enforcement — mixed float32/float64 tensor operations that previously passed silently now raise:

RuntimeError: expected scalar type Float but found Double

Two fixes applied directly to the source:

1. `s3tokenizer/s3tokenizer.py` — audio cast before `torch.stft`

torch.stft output dtype follows its input. The audio waveform was arriving as float64 (from numpy/librosa), while the mel filter bank buffer _mel_filters is float32. The matmul failed.

Fix: cast audio to float32 immediately after moving to device, before the stft.

2. `voice_encoder/melspec.py` — unconditional float32 output

scipy/librosa operations produce float64 numpy arrays. The original code only cast to float32 when normalized_mels=True — leaving float64 arrays flowing into torch.as_tensor() and on to the LSTM when normalized_mels=False.

Fix: always return float32 from melspectrogram().

Who needs this

RTX 5080 / RTX 5090 (Blackwell) users — CUDA 13.0 requires torch nightly, which triggers this
Anyone running PyTorch 2.11+ regardless of GPU

PyTorch nightly setup (uv)

Blackwell requires the cu130 torch nightly index. In pyproject.toml:

[[tool.uv.index]]
name = "pytorch-nightly-cu130"
url = "https://download.pytorch.org/whl/nightly/cu130"
explicit = true

[tool.uv.sources]
torch = { index = "pytorch-nightly-cu130" }
torchaudio = { index = "pytorch-nightly-cu130" }

[tool.uv.override-dependencies]
torch = ["torch>=2.11.0"]
torchaudio = ["torchaudio>=2.11.0"]

Note: torch-backend = "cu130" alone does not work — it resolves but installs CPU wheels on sync. Use the explicit index approach above.

Install this fork:

[tool.uv.sources]
chatterbox-tts = { git = "https://github.com/psarno/chatterbox", branch = "main" }

Or editable from a local clone:

[tool.uv.sources]
chatterbox-tts = { path = "../chatterbox", editable = true }

Upstream

All original model code, weights, and MIT license are from Resemble AI. See the upstream repo for full documentation, usage examples, and model details.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
src/chatterbox		src/chatterbox
.gitignore		.gitignore
Chatterbox-Multilingual.png		Chatterbox-Multilingual.png
Chatterbox-Turbo.jpg		Chatterbox-Turbo.jpg
LICENSE		LICENSE
README.md		README.md
example_for_mac.py		example_for_mac.py
example_tts.py		example_tts.py
example_tts_turbo.py		example_tts_turbo.py
example_vc.py		example_vc.py
gradio_tts_app.py		gradio_tts_app.py
gradio_tts_turbo_app.py		gradio_tts_turbo_app.py
gradio_vc_app.py		gradio_vc_app.py
multilingual_app.py		multilingual_app.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

psarno/chatterbox

What's changed

1. `s3tokenizer/s3tokenizer.py` — audio cast before `torch.stft`

2. `voice_encoder/melspec.py` — unconditional float32 output

Who needs this

PyTorch nightly setup (uv)

Upstream

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

psarno/chatterbox

What's changed

1. s3tokenizer/s3tokenizer.py — audio cast before torch.stft

2. voice_encoder/melspec.py — unconditional float32 output

Who needs this

PyTorch nightly setup (uv)

Upstream

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `s3tokenizer/s3tokenizer.py` — audio cast before `torch.stft`

2. `voice_encoder/melspec.py` — unconditional float32 output

Packages