A comprehensive benchmarking tool for comparing Whisper speech-to-text implementations on Apple Silicon (Macbook M1).
- π― 4 Whisper Implementations: OpenAI Whisper, faster-whisper, mlx-whisper, lightning-whisper-mlx
- π VAD Support: Process only speech segments to avoid hallucinations on silent audio
- β±οΈ Performance Metrics: Timing, throughput, and realtime factor analysis
- π Fair Comparisons: Consistent configuration across all models (greedy decoding, temperature=0)
- π Quality Analysis: Save individual transcripts for manual comparison
- ποΈ Simple Setup: Drop audio files in
inputs/folder and run
git clone <repo-url>
cd whisper-bench
# Basic installation
uv sync
# Or install with performance optimizations (includes GPU support)
uv sync --extra performance
# Or install everything
uv sync --extra all# Place your files in the inputs folder:
inputs/
βββ my_audio.wav # Your audio file
βββ my_audio_VAD_timestamps.json # Optional: VAD timestamps for better quality# Standard benchmark (apples-to-apples comparison)
uv run python main.py
# Or use performance optimizations for speed testing
uv run python main.py --performance-profile aggressiveSupported formats: .wav, .mp3, .flac
JSON file with speech segments to avoid processing silent sections:
[
{
"start_sec": 10.5,
"end_sec": 15.8,
"duration_sec": 5.3
}
]Benefits of VAD:
- β Avoids Whisper hallucinations on long silent sections
- β More accurate quality assessment
- β Focus transcription on actual speech content
All models use identical settings for fair comparison:
- Greedy decoding (beam_size=1) - most restrictive common denominator
- Temperature=0 - deterministic output
- Word timestamps=True - detailed timing information
- No conditioning on previous text
For speed testing, use performance profiles that maintain quality while optimizing computational efficiency:
# List available profiles
uv run python main.py --list-profiles
# Performance profiles (quality-preserving optimizations only):
uv run python main.py --performance-profile baseline # Standard comparison
uv run python main.py --performance-profile conservative # Minimal optimizations
uv run python main.py --performance-profile balanced # Good balance
uv run python main.py --performance-profile aggressive # Maximum performancePerformance optimizations include:
- GPU acceleration (when available)
- Optimized batch processing
- Memory management improvements
- Threading optimization (
cpu_threads,num_workers) - Model quantization (limited by Apple Hardware compatibility)
Quality parameters remain unchanged - only computational efficiency is improved.
Apple Hardware Limitations:
- Lightning-whisper-mlx quantization disabled due to QuantizedLinear compatibility issues
- MLX models automatically use float16 precision without explicit dtype settings
Threading Optimization Details:
- faster-whisper: Uses
cpu_threads(OpenMP threads),num_workers(parallel workers), andchunk_length(audio segmentation) - Environment variables:
OMP_NUM_THREADSautomatically set for additional optimization - Note:
batch_sizeis not supported by faster-whisper's transcribe() method
uv run python main.py [OPTIONS]
Options:
-p, --performance-profile {baseline,conservative,balanced,aggressive}
Performance optimization profile to use (default: baseline)
-l, --list-profiles List available performance profiles and exit
-m, --model-sizes {tiny,base,small,medium,large,large-v3,turbo} [...]
Model sizes to benchmark (turbo = large-v3-turbo for 8x speed) (default: base)
-o, --output-dir DIR Output directory for benchmark results (default: outputs)
-h, --help Show help message and exit
Examples:
uv run python main.py # Standard benchmark
uv run python main.py -p aggressive -m base small # Performance test with multiple sizes
uv run python main.py -p balanced -m turbo # Test new turbo model (8x faster)
uv run python main.py -p balanced -o my_results # Custom output directory
uv run python main.py --list-profiles # Show available profilesResults are saved to outputs/:
outputs/
βββ benchmark_results_YYYYMMDD_HHMMSS.json # Detailed metrics
βββ benchmark_summary_YYYYMMDD_HHMMSS.txt # Human-readable summary
βββ transcripts/ # Individual model outputs
βββ whisper/
βββ faster_whisper/
βββ mlx_whisper/
βββ lightning_whisper_mlx/
| Model | Speed | Platform | Beam Search |
|---|---|---|---|
| OpenAI Whisper | Moderate | All | β |
| faster-whisper | Fast | All | β |
| mlx-whisper | Fast | Apple Silicon | β (greedy only) |
| lightning-whisper-mlx | Very Fast | Apple Silicon | β (greedy only) |
Note: All models now use greedy decoding (beam_size=1) to ensure fair comparison.
π― Found audio file in inputs folder: podcast.wav
π Found VAD timestamps file: podcast_VAD_timestamps.json
π Enabling VAD timestamp processing to avoid hallucinations on silent sections
Starting benchmark with 4 total tests...
Models: ['whisper', 'faster-whisper', 'mlx-whisper', 'lightning-whisper-mlx']
--------------------------------------------------------------------------------
π΅ Processing: podcast.wav
π Model size: base
π [1/4] whisper... β
45.2s (12.3x realtime)
π [2/4] faster-whisper... β
32.1s (17.3x realtime)
π [3/4] mlx-whisper... β
18.7s (29.7x realtime)
π [4/4] lightning-whisper-mlx... β
15.3s (36.4x realtime)
================================================================================
BENCHMARK COMPLETE!
================================================================================
- Python 3.11+
- Audio files in supported formats (.wav, .mp3, .flac)
- For MLX models: Apple Silicon Mac
- For VAD processing:
librosaandsoundfile(auto-installed) - For GPU acceleration: PyTorch (install with
--extra performance)
Core dependencies (always installed):
mlx-whisper- MLX-optimized Whisper for Apple Siliconopenai-whisper- Original OpenAI implementationfaster-whisper- CTranslate2-based accelerationlightning-whisper-mlx- High-performance MLX implementationlibrosa- Audio processingsoundfile- Audio I/Onumpy- Numerical computing
Optional dependencies:
torch>=2.0.0- GPU detection and acceleration (install with--extra performance)
Installation options:
uv sync # Basic installation (CPU only)
uv sync --extra performance # Include GPU support
uv sync --extra all # Everythingwhisper-bench/
βββ main.py # Entry point with inputs/ folder detection
βββ config.py # Global settings for fair comparison
βββ benchmarks/ # Model implementations
β βββ base.py # Abstract base class with VAD support
β βββ openai_whisper.py
β βββ faster_whisper.py
β βββ mlx_whisper.py
β βββ lightning_whisper_mlx.py
βββ inputs/ # Place your audio + VAD files here
βββ outputs/ # Generated results and transcripts
# Standard comparison for quality assessment
uv run python main.py --performance-profile baseline -m base# Test with performance optimizations
uv run python main.py --performance-profile aggressive -m base
# Compare multiple model sizes with optimizations
uv run python main.py --performance-profile balanced -m base small medium# Install GPU support first
uv sync --extra performance
# Run with GPU optimizations
uv run python main.py --performance-profile aggressiveThe performance profiles ensure identical quality while optimizing computational efficiency, allowing you to measure pure performance improvements without quality trade-offs.
- MLX models fail: Requires Apple Silicon Mac
- No audio found: Place files in
inputs/folder - VAD not working: Ensure
librosainstalled withuv sync - Out of memory: Use shorter audio files or smaller model sizes
- GPU not detected: Install with
uv sync --extra performanceand ensure CUDA available - Performance profiles not working: Check that performance_config.py imports correctly
- Command not found: Use
uv run python main.pyinstead of justpython main.py