takigrapher

Automatic folder recursive transcription of any file containing audio using OpenAI Whisper.

Audio files supported: .mp3, .wav, .m4a, .flac, .aac, .ogg, .wma, .mp4, .mkv, .webm, .opus, .mov, .avi

Files will be saved in the same directory as the media file, with the same base name.

Supported output files types: .lrc, .vtt, .srt, .txt, .json

Usage

Help

python3 src/main.py --help

usage: python3 main.py [options]

Transcribe media files to LRC using Whisper

options:
  -h, --help            show this help message and exit
  -m [PATH], --media [PATH]
                        Path to a file or directory where media files will be searched recursively
  -n [MODEL], --modelname [MODEL]
                        available whisper models: (Default: tiny)
                        tiny: Smallest, fastest model with lower accuracy.
                        tiny.en: English-only tiny, slightly better for English tasks.
                        base: Balanced size, speed, and accuracy.
                        base.en: English-only base, improved English performance.
                        small: More accurate than base, but larger and slower.
                        small.en: English-only small, enhanced for English tasks.
                        medium: High accuracy, resource-intensive.
                        medium.en: English-only medium, optimized for English.
                        large: Original large model, high accuracy, heavy and slow.
                        large-v1: First large variant, improved accuracy and stability.
                        large-v2: Upgraded large-v1, better reasoning and alignment.
                        large-v3: Most advanced, best performance overall.
                        large-v3-turbo: Optimized large-v3, faster with similar accuracy.
                        turbo: Fastest variant, high accuracy, resource-efficient.
  -v, --verbose         activate verbose mode
  -im, --inmemory       load model entirely into RAM
  -d [DEVICE], --device [DEVICE]
                        available devices: cpu or cuda
  -st [TYPE], --sourcetype [TYPE]
                        available types: mp3, wav, m4a, flac, aac, ogg, wma, mp4, mkv, webm, opus, mov, avi. (Default: all)
  -sl [LANGUAGE], --sourcelanguage [LANGUAGE]
  -tl [LANGUAGE], --targetlanguage [LANGUAGE]
                        ISO 639-1 available languages:
                        af: afrikaans|am: amharic|ar: arabic|as: assamese|az: azerbaijani|ba: bashkir|be: belarusian|bg: bulgarian|bn: bengali|bo: tibetan|br: breton|bs: bosnian|ca: catalan|cs: czech|cy: welsh|da: danish|de: german|el: greek|en: english|es: spanish|et: estonian|eu: basque|fa: persian|fi: finnish|fo: faroese|fr: french|gl: galician|gu: gujarati|ha: hausa|haw: hawaiian|he: hebrew|hi: hindi|hr: croatian|ht: haitian creole|hu: hungarian|hy: armenian|id: indonesian|is: icelandic|it: italian|ja: japanese|jw: javanese|ka: georgian|kk: kazakh|km: khmer|kn: kannada|ko: korean|la: latin|lb: luxembourgish|ln: lingala|lo: lao|lt: lithuanian|lv: latvian|mg: malagasy|mi: maori|mk: macedonian|ml: malayalam|mn: mongolian|mr: marathi|ms: malay|mt: maltese|my: myanmar|ne: nepali|nl: dutch|nn: nynorsk|no: norwegian|oc: occitan|pa: punjabi|pl: polish|ps: pashto|pt: portuguese|ro: romanian|ru: russian|sa: sanskrit|sd: sindhi|si: sinhala|sk: slovak|sl: slovenian|sn: shona|so: somali|sq: albanian|sr: serbian|su: sundanese|sv: swedish|sw: swahili|ta: tamil|te: telugu|tg: tajik|th: thai|tk: turkmen|tl: tagalog|tr: turkish|tt: tatar|uk: ukrainian|ur: urdu|uz: uzbek|vi: vietnamese|yi: yiddish|yo: yoruba|yue: cantonese|zh: chinese. (Default: auto)
  -tt [TYPE], --targettype [TYPE]
                        available types: lrc, txt, srt, json, vtt. (Default: lrc)
  -te [ACTION], --targetexists [ACTION]
                        available actions: overwrite, skip, rename. (Default: skip)
  -ts, --targetsuffix   add suffix to target file name. (Default: false)
  -ea, --exportall      export original and translated text together as target files.
                        (Default: false)
  -t TRACK, --track TRACK
                        extract audio track (1=first, 2=second, 3=third, etc). (Default: 1)
  --temperature [TEMP]  Temperature for transcription sampling (0.0 to 1.0).
                        Lower values increase determinism, higher values increase variability. (Default: 0.0)
  --beam-size [SIZE]    Number of hypotheses considered during decoding (1 to 20).
                        Higher values increase accuracy but slow down processing. (Default: 5)
  --best-of [N]         Number of transcription samples to compare (1 to 10).
                        Higher values improve accuracy but increase processing time. (Default: 5)
  --prompt [TEXT]       Initial text to guide transcription (e.g., context or
                        keywords). (Default: None)

Example usage:
python3 src/main.py --media ./media/sample.mp3 --modelname tiny --device cuda --verbose --sourcetype mp3 --sourcelanguage en --targetlanguage en

CLI

Setup

python3 -m venv .venv

source .venv/bin/activate # Linux/macOS
# .\venv\Scripts\activate # Windows

pip3 install -r requirements.txt

Command line

# base model without in-memory
python3 src/main.py -v -m ./media/sample.mp3 -n base.en -tt lrc -te overwrite

# larger model with in-memory model
python3 src/main.py -v -m ./media/sample.mp3 -n large -im -sl en -tt lrc -te overwrite -ts

# transcribe a specific audio track
python3 src/main.py -v -m ./media/sample3trk.mp4 -n medium.en -sl en -tt lrc -te overwrite -t 2

# transcribe a specific audio track with different settings
python3 src/main.py -v -m ./media/sample3trk.mp4 -n base.en -sl en -tt lrc -te overwrite -t 2 --temperature 0.2 --beam-size 7 --best-of 5 --prompt "transcribe the voice"

Docker

Device

CPU

docker run -d -name takigrapher \
-v "./data/whisper/cache:/root/.cache/whisper" \
-v "/mnt/nas/music/:/app/media/music/" \
luizhp/takigrapher:latest

GPU

docker run -d --gpus all \
--name takigrapher \
-v "./data/whisper/cache:/root/.cache/whisper" \
-v "/mnt/nas/music/:/app/media/music/" \
luizhp/takigrapher:latest

Volume Mounts

mount media folder -v "/mnt/nas/music/:/app/media/music/"
keep models in /root/.cache/whisper -v "./data/whisper/cache:/root/.cache/whisper"

Examples of how to execute the application

Below are some examples of how to execute the application for transcribing files or folders containing audio using different command-line options and Docker commands:

docker exec -it takigrapher bash

takigrapher -v -m ./media/sample.mp3 -n tiny.en -tt srt -te overwrite -sl en -ts

docker exec -it takigrapher takigrapher -v -m ./media/music/bandname/ -n medium -tt lrc -te overwrite -ts

docker exec -it takigrapher takigrapher -v -m ./media/music/bandname/song.mp3 -n medium -tt lrc -te overwrite -ts

docker exec -it takigrapher takigrapher -m ./media/tv/mytvshow/ -n medium.en -sl en -tt srt -te rename

Docker Compose

Devices

CPU

Check here for the docker-compose-cpu.yaml file

GPU

Check here for the docker-compose-gpu.yaml file

License

This project is licensed under the GNU GPLv3 License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
media		media
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile-cuda		Dockerfile-cuda
LICENSE		LICENSE
README.md		README.md
docker-compose-cpu.yaml		docker-compose-cpu.yaml
docker-compose-gpu.yaml		docker-compose-gpu.yaml
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

takigrapher

Usage

Help

CLI

Setup

Command line

Docker

Device

CPU

GPU

Volume Mounts

Examples of how to execute the application

Docker Compose

Devices

CPU

GPU

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

takigrapher

Usage

Help

CLI

Setup

Command line

Docker

Device

CPU

GPU

Volume Mounts

Examples of how to execute the application

Docker Compose

Devices

CPU

GPU

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages