This project provides a Python-based GUI application for transcribing audio files with speaker identification, utilizing libraries such as Whisper, PyAnnote, and Librosa. The application allows users to select an audio file, process it for transcription, and identify speakers automatically.
- Audio Transcription: Uses OpenAI's Whisper model to transcribe audio files in various formats.
- Speaker Diarization: Identifies different speakers in the audio using the PyAnnote library.
- Progress Tracking: A visual progress bar displays the current stage of processing.
- GUI Interface: An easy-to-use graphical user interface built with Tkinter.
Before running the application, make sure you have the following dependencies installed:
- librosa
- numpy
- whisper
- pyannote.audio
- tkinter
- soundfile
- queue
- logging
- ffmpeg-python
- threading
- FFmpeg: Make sure FFmpeg is installed and configured in your system's environment. You may need to adjust the path to the FFmpeg executable in the script.
-
Clone the repository: Download or clone this repository using:
git clone https://github.com/your-username/your-repo-name.git
-
Install Python and dependencies:
- Make sure you have Python 3.8+ installed. You can download Python here.
- Open a command prompt and install the required dependencies:
pip install librosa numpy whisper pyannote.audio soundfile ffmpeg-python
-
Install FFmpeg:
- Download FFmpeg for Windows.
- Extract the FFmpeg files to
C:\ffmpeg(or any directory of your choice). - Add the FFmpeg
binfolder to your PATH:- Right-click on "This PC" > "Properties" > "Advanced system settings" > "Environment Variables".
- Under "System Variables", find the
Pathvariable, click "Edit", and add the path toC:\ffmpeg\bin.
-
Run the application:
python transcription_app.py
-
Clone the repository: Open a terminal and run:
git clone https://github.com/your-username/your-repo-name.git
-
Install Python and dependencies:
- Install Python 3.8+ and pip:
sudo apt-get install python3 python3-pip
- Install the required Python libraries:
pip3 install librosa numpy whisper pyannote.audio soundfile ffmpeg-python
- Install Python 3.8+ and pip:
-
Install FFmpeg:
- On Linux, FFmpeg can be installed through package managers. Run the following:
sudo apt-get install ffmpeg
- On Linux, FFmpeg can be installed through package managers. Run the following:
-
Run the application:
python3 transcription_app.py
-
Launch the application by running the appropriate command for your operating system:
- On Windows:
python transcription_app.py
- On Linux:
python3 transcription_app.py
- On Windows:
-
Use the graphical interface to:
- Select an audio file (supports
.wav,.mp3,.flac,.m4a, and more). - Specify the output file for saving the transcription.
- Choose the model size and language for Whisper-based transcription.
- Use the diarization model to identify speakers in the audio.
- Select an audio file (supports
To make the application executable on both Windows and Linux without requiring Python, you can package it using PyInstaller.
-
Install PyInstaller:
pip install pyinstaller
-
Build the executable:
pyinstaller --onefile --windowed transcription_app.py
This will create a single
.exefile in thedistfolder that you can run without needing Python installed.
-
Install PyInstaller:
pip3 install pyinstaller
-
Build the executable:
pyinstaller --onefile --windowed transcription_app.py
This will create a standalone binary in the
distfolder that can be run directly on Linux.
If FFmpeg is not in your system's PATH, you can customize the script to point to your FFmpeg installation by modifying this line:
ffmpeg_executable = r'C:\ffmpeg\ffmpeg.exe' # Adjust this path if necessaryOn Linux, this can be left as is if FFmpeg is installed via the package manager.
You need to replace the placeholder "your_hugging_face_token_here" with your actual Hugging Face API token for using the PyAnnote speaker diarization model.
diarization_pipeline = Pipeline.from_pretrained(
diarization_model_name,
use_auth_token="your_hugging_face_token_here"
)If you'd like to contribute, feel free to submit a pull request or open an issue for any bugs or features you'd like to see.
This project is licensed under the MIT License - see the LICENSE file for details.