Skip to content

dotdon/Audio-Transcription-with-Speaker-Identification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Audio Transcription with Speaker Identification

This project provides a Python-based GUI application for transcribing audio files with speaker identification, utilizing libraries such as Whisper, PyAnnote, and Librosa. The application allows users to select an audio file, process it for transcription, and identify speakers automatically.

Features

  • Audio Transcription: Uses OpenAI's Whisper model to transcribe audio files in various formats.
  • Speaker Diarization: Identifies different speakers in the audio using the PyAnnote library.
  • Progress Tracking: A visual progress bar displays the current stage of processing.
  • GUI Interface: An easy-to-use graphical user interface built with Tkinter.

Requirements

Before running the application, make sure you have the following dependencies installed:

Python Libraries

  • librosa
  • numpy
  • whisper
  • pyannote.audio
  • tkinter
  • soundfile
  • queue
  • logging
  • ffmpeg-python
  • threading

External Tools

  • FFmpeg: Make sure FFmpeg is installed and configured in your system's environment. You may need to adjust the path to the FFmpeg executable in the script.

Installation

Windows

  1. Clone the repository: Download or clone this repository using:

    git clone https://github.com/your-username/your-repo-name.git
  2. Install Python and dependencies:

    • Make sure you have Python 3.8+ installed. You can download Python here.
    • Open a command prompt and install the required dependencies:
      pip install librosa numpy whisper pyannote.audio soundfile ffmpeg-python
  3. Install FFmpeg:

    • Download FFmpeg for Windows.
    • Extract the FFmpeg files to C:\ffmpeg (or any directory of your choice).
    • Add the FFmpeg bin folder to your PATH:
      • Right-click on "This PC" > "Properties" > "Advanced system settings" > "Environment Variables".
      • Under "System Variables", find the Path variable, click "Edit", and add the path to C:\ffmpeg\bin.
  4. Run the application:

    python transcription_app.py

Linux

  1. Clone the repository: Open a terminal and run:

    git clone https://github.com/your-username/your-repo-name.git
  2. Install Python and dependencies:

    • Install Python 3.8+ and pip:
      sudo apt-get install python3 python3-pip
    • Install the required Python libraries:
      pip3 install librosa numpy whisper pyannote.audio soundfile ffmpeg-python
  3. Install FFmpeg:

    • On Linux, FFmpeg can be installed through package managers. Run the following:
      sudo apt-get install ffmpeg
  4. Run the application:

    python3 transcription_app.py

Usage

  1. Launch the application by running the appropriate command for your operating system:

    • On Windows:
      python transcription_app.py
    • On Linux:
      python3 transcription_app.py
  2. Use the graphical interface to:

    • Select an audio file (supports .wav, .mp3, .flac, .m4a, and more).
    • Specify the output file for saving the transcription.
    • Choose the model size and language for Whisper-based transcription.
    • Use the diarization model to identify speakers in the audio.

Packaging the Application for Windows and Linux

To make the application executable on both Windows and Linux without requiring Python, you can package it using PyInstaller.

Packaging for Windows

  1. Install PyInstaller:

    pip install pyinstaller
  2. Build the executable:

    pyinstaller --onefile --windowed transcription_app.py

    This will create a single .exe file in the dist folder that you can run without needing Python installed.

Packaging for Linux

  1. Install PyInstaller:

    pip3 install pyinstaller
  2. Build the executable:

    pyinstaller --onefile --windowed transcription_app.py

    This will create a standalone binary in the dist folder that can be run directly on Linux.

Customization

FFmpeg Path

If FFmpeg is not in your system's PATH, you can customize the script to point to your FFmpeg installation by modifying this line:

ffmpeg_executable = r'C:\ffmpeg\ffmpeg.exe'  # Adjust this path if necessary

On Linux, this can be left as is if FFmpeg is installed via the package manager.

Hugging Face Token

You need to replace the placeholder "your_hugging_face_token_here" with your actual Hugging Face API token for using the PyAnnote speaker diarization model.

diarization_pipeline = Pipeline.from_pretrained(
    diarization_model_name,
    use_auth_token="your_hugging_face_token_here"
)

Contributing

If you'd like to contribute, feel free to submit a pull request or open an issue for any bugs or features you'd like to see.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

I made this to help my wife with her dissertation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages