Skip to content

Conversation

@o-alexandre-felipe
Copy link

feat: Add Supertonic TTS speech service

This PR adds a new speech service for Supertonic TTS, a fast, high-quality offline text-to-speech engine using ONNX models.

Features:

  • Zero configuration: Models auto-download from Hugging Face Hub on first use (~250MB)
  • Fully offline: After initial download, no internet connection required
  • Fast inference: Up to 167x faster than real-time on modern hardware
  • Multiple voices: 4 voice styles available (M1, M2 - male, F1, F2 - female)
  • Configurable quality: Adjustable denoising steps (speed vs quality tradeoff)

Installation:

    pip install "manim-voiceover[supertonic]"

Usage:

    from manim_voiceover.services.supertonic import SupertonicService

    service = SupertonicService()

    service = SupertonicService(
        voice_style="F1",
        total_step=5,
        speed=1.0,
    )

Dependencies added:

  • onnxruntime - ONNX model inference
  • soundfile - Audio file I/O
  • huggingface-hub - Model downloading

Testing:
Verified in clean Docker container. All dependencies install correctly and TTS generates audio successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant