Skip to content

TaterTotterson/microWakeWord-Trainer-Nvidia-Docker

Repository files navigation

🎙️ microWakeWord Nvidia Trainer & Recorder

Screenshot 2026-01-18 at 8 13 35 AM

Train microWakeWord detection models using a simple web-based recorder + trainer UI, packaged in a Docker container.

No Jupyter notebooks required. No manual cell execution. Just record your voice (optional) and train.


unraid_logo_black-339076895

microWakeWord_Trainer-Nvidia is available in the Unraid Community Apps store. Install directly from the Unraid App Store with a one-click template.


unraid_logo_black-339076895

Pull the Docker Image

docker pull ghcr.io/tatertotterson/microwakeword:latest

Run the Container

docker run -d \
  --gpus all \
  -p 8888:8888 \
  -v $(pwd):/data \
  ghcr.io/tatertotterson/microwakeword:latest

What these flags do:

  • --gpus all → Enables GPU acceleration
  • -p 8888:8888 → Exposes the Recorder + Trainer WebUI
  • -v $(pwd):/data → Persists all models, datasets, and cache

Open the Recorder WebUI

Open your browser and go to:

👉 http://localhost:8888

You’ll see the microWakeWord Recorder & Trainer UI.


🎤 Recording Voice Samples (Optional)

Personal voice recordings are optional.

  • You may record your own voice for better accuracy
  • Or simply click “Train” without recording anything

If no recordings are present, training will proceed using synthetic TTS samples only.

Remote systems (important)

If you are running this on a remote PC / server, browser-based recording will not work unless:

  • You use a reverse proxy (HTTPS + mic permissions), or
  • You access the UI via localhost on the same machine

Training itself works fine remotely — only recording requires local microphone access.


🎙️ Recording Flow

  1. Enter your wake word
  2. Test pronunciation with Test TTS
  3. Choose:
    • Number of speakers (e.g. family members)
    • Takes per speaker (default: 10)
  4. Click Begin recording
  5. Speak naturally — recording:
    • Starts when you talk
    • Stops automatically after silence
  6. Repeat for each speaker

Files are saved automatically to:

personal_samples/
  speaker01_take01.wav
  speaker01_take02.wav
  speaker02_take01.wav
  ...

🧠 Training Behavior (Important Notes)

⏬ First training run

The first time you click Train, the system will download large training datasets (background noise, speech corpora, etc.).

  • This can take several minutes
  • This happens only once
  • Data is cached inside /data

You will NOT need to download these again unless you delete /data.


🔁 Re-training is safe and incremental

  • You can train multiple wake words back-to-back
  • You do NOT need to clear any folders between runs
  • Old models are preserved in timestamped output directories
  • All required cleanup and reuse logic is handled automatically

📦 Output Files

When training completes, you’ll get:

  • <wake_word>.tflite – quantized streaming model
  • <wake_word>.json – ESPHome-compatible metadata

Both are saved under:

/data/output/

Each run is placed in its own timestamped folder.


🎤 Optional: Personal Voice Samples (Advanced)

If you record personal samples:

  • They are automatically augmented
  • They are up-weighted during training
  • This significantly improves real-world accuracy

No configuration required — detection is automatic.


🔄 Resetting Everything (Optional)

If you want a completely clean slate:

Delete the /data folder

Then restart the container.

⚠️ This will:

  • Remove cached datasets
  • Require re-downloading training data
  • Delete trained models

🙌 Credits

Built on top of the excellent
https://github.com/kahrendt/microWakeWord

Huge thanks to the original authors ❤️

About

Train microWakeWord for use with HomeAssistant Voice

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages