Skip to content

luizhp/takigrapher

Repository files navigation

takigrapher

Automatic folder recursive transcription of any file containing audio using OpenAI Whisper.

Audio files supported: .mp3, .wav, .m4a, .flac, .aac, .ogg, .wma, .mp4, .mkv, .webm, .opus, .mov, .avi

Files will be saved in the same directory as the media file, with the same base name.

Supported output files types: .lrc, .vtt, .srt, .txt, .json

Usage

Help

python3 src/main.py --help
usage: python3 main.py [options]

Transcribe media files to LRC using Whisper

options:
  -h, --help            show this help message and exit
  -m [PATH], --media [PATH]
                        Path to a file or directory where media files will be searched recursively
  -n [MODEL], --modelname [MODEL]
                        available whisper models: (Default: tiny)
                        tiny: Smallest, fastest model with lower accuracy.
                        tiny.en: English-only tiny, slightly better for English tasks.
                        base: Balanced size, speed, and accuracy.
                        base.en: English-only base, improved English performance.
                        small: More accurate than base, but larger and slower.
                        small.en: English-only small, enhanced for English tasks.
                        medium: High accuracy, resource-intensive.
                        medium.en: English-only medium, optimized for English.
                        large: Original large model, high accuracy, heavy and slow.
                        large-v1: First large variant, improved accuracy and stability.
                        large-v2: Upgraded large-v1, better reasoning and alignment.
                        large-v3: Most advanced, best performance overall.
                        large-v3-turbo: Optimized large-v3, faster with similar accuracy.
                        turbo: Fastest variant, high accuracy, resource-efficient.
  -v, --verbose         activate verbose mode
  -im, --inmemory       load model entirely into RAM
  -d [DEVICE], --device [DEVICE]
                        available devices: cpu or cuda
  -st [TYPE], --sourcetype [TYPE]
                        available types: mp3, wav, m4a, flac, aac, ogg, wma, mp4, mkv, webm, opus, mov, avi. (Default: all)
  -sl [LANGUAGE], --sourcelanguage [LANGUAGE]
  -tl [LANGUAGE], --targetlanguage [LANGUAGE]
                        ISO 639-1 available languages:
                        af: afrikaans|am: amharic|ar: arabic|as: assamese|az: azerbaijani|ba: bashkir|be: belarusian|bg: bulgarian|bn: bengali|bo: tibetan|br: breton|bs: bosnian|ca: catalan|cs: czech|cy: welsh|da: danish|de: german|el: greek|en: english|es: spanish|et: estonian|eu: basque|fa: persian|fi: finnish|fo: faroese|fr: french|gl: galician|gu: gujarati|ha: hausa|haw: hawaiian|he: hebrew|hi: hindi|hr: croatian|ht: haitian creole|hu: hungarian|hy: armenian|id: indonesian|is: icelandic|it: italian|ja: japanese|jw: javanese|ka: georgian|kk: kazakh|km: khmer|kn: kannada|ko: korean|la: latin|lb: luxembourgish|ln: lingala|lo: lao|lt: lithuanian|lv: latvian|mg: malagasy|mi: maori|mk: macedonian|ml: malayalam|mn: mongolian|mr: marathi|ms: malay|mt: maltese|my: myanmar|ne: nepali|nl: dutch|nn: nynorsk|no: norwegian|oc: occitan|pa: punjabi|pl: polish|ps: pashto|pt: portuguese|ro: romanian|ru: russian|sa: sanskrit|sd: sindhi|si: sinhala|sk: slovak|sl: slovenian|sn: shona|so: somali|sq: albanian|sr: serbian|su: sundanese|sv: swedish|sw: swahili|ta: tamil|te: telugu|tg: tajik|th: thai|tk: turkmen|tl: tagalog|tr: turkish|tt: tatar|uk: ukrainian|ur: urdu|uz: uzbek|vi: vietnamese|yi: yiddish|yo: yoruba|yue: cantonese|zh: chinese. (Default: auto)
  -tt [TYPE], --targettype [TYPE]
                        available types: lrc, txt, srt, json, vtt. (Default: lrc)
  -te [ACTION], --targetexists [ACTION]
                        available actions: overwrite, skip, rename. (Default: skip)
  -ts, --targetsuffix   add suffix to target file name. (Default: false)
  -ea, --exportall      export original and translated text together as target files.
                        (Default: false)
  -t TRACK, --track TRACK
                        extract audio track (1=first, 2=second, 3=third, etc). (Default: 1)
  --temperature [TEMP]  Temperature for transcription sampling (0.0 to 1.0).
                        Lower values increase determinism, higher values increase variability. (Default: 0.0)
  --beam-size [SIZE]    Number of hypotheses considered during decoding (1 to 20).
                        Higher values increase accuracy but slow down processing. (Default: 5)
  --best-of [N]         Number of transcription samples to compare (1 to 10).
                        Higher values improve accuracy but increase processing time. (Default: 5)
  --prompt [TEXT]       Initial text to guide transcription (e.g., context or
                        keywords). (Default: None)

Example usage:
python3 src/main.py --media ./media/sample.mp3 --modelname tiny --device cuda --verbose --sourcetype mp3 --sourcelanguage en --targetlanguage en

CLI

Setup

python3 -m venv .venv

source .venv/bin/activate # Linux/macOS
# .\venv\Scripts\activate # Windows

pip3 install -r requirements.txt

Command line

# base model without in-memory
python3 src/main.py -v -m ./media/sample.mp3 -n base.en -tt lrc -te overwrite

# larger model with in-memory model
python3 src/main.py -v -m ./media/sample.mp3 -n large -im -sl en -tt lrc -te overwrite -ts

# transcribe a specific audio track
python3 src/main.py -v -m ./media/sample3trk.mp4 -n medium.en -sl en -tt lrc -te overwrite -t 2

# transcribe a specific audio track with different settings
python3 src/main.py -v -m ./media/sample3trk.mp4 -n base.en -sl en -tt lrc -te overwrite -t 2 --temperature 0.2 --beam-size 7 --best-of 5 --prompt "transcribe the voice"

Docker

Device

CPU
docker run -d -name takigrapher \
-v "./data/whisper/cache:/root/.cache/whisper" \
-v "/mnt/nas/music/:/app/media/music/" \
luizhp/takigrapher:latest
GPU
docker run -d --gpus all \
--name takigrapher \
-v "./data/whisper/cache:/root/.cache/whisper" \
-v "/mnt/nas/music/:/app/media/music/" \
luizhp/takigrapher:latest

Volume Mounts

  • mount media folder -v "/mnt/nas/music/:/app/media/music/"

  • keep models in /root/.cache/whisper -v "./data/whisper/cache:/root/.cache/whisper"

Examples of how to execute the application

Below are some examples of how to execute the application for transcribing files or folders containing audio using different command-line options and Docker commands:

docker exec -it takigrapher bash

takigrapher -v -m ./media/sample.mp3 -n tiny.en -tt srt -te overwrite -sl en -ts
docker exec -it takigrapher takigrapher -v -m ./media/music/bandname/ -n medium -tt lrc -te overwrite -ts
docker exec -it takigrapher takigrapher -v -m ./media/music/bandname/song.mp3 -n medium -tt lrc -te overwrite -ts
docker exec -it takigrapher takigrapher -m ./media/tv/mytvshow/ -n medium.en -sl en -tt srt -te rename

Docker Compose

Devices

CPU

Check here for the docker-compose-cpu.yaml file

GPU

Check here for the docker-compose-gpu.yaml file

License

This project is licensed under the GNU GPLv3 License - see the LICENSE file for details.

About

Convert audio media into text transcriptions in standard file formats.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors