OpenAI-compatible speech-to-text transcription server using google/medasr model.
- OpenAI API compatible endpoint (
POST /v1/audio/transcriptions) - Supports common audio formats: mp3, wav, flac, m4a, ogg, webm, mp4
- Model loaded once at startup and cached in memory (no reloading per request)
- GPU acceleration support (auto-detected)
- YAML configuration
- Audio processed in-memory (no temporary files on disk)
# Create virtual environment (recommended)
python -m venv venv
./venv/scripts/activate
# Install dependencies
pip install -r requirements.txtEdit config.yaml to customize:
server:
host: "0.0.0.0"
port: 8000
model:
name: "google/medasr"
chunk_length_s: 20
stride_length_s: 2
device: "auto" # "auto", "cuda", or "cpu"
huggingface:
token: null # Optional: HF token for private models or rate limit increase# Start the server
python main.pyThe server will start at http://localhost:8000.
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F "file=@audio.wav"Response:
{
"text": "transcribed text here"
}curl http://localhost:8000/healthResponse:
{
"status": "healthy",
"model": "google/medasr"
}You can use this server with OpenAI client libraries by setting the base URL:
from openai import OpenAI
client = OpenAI(
api_key="dummy", # No actual API key needed
base_url="http://localhost:8000/v1"
)
with open("audio.wav", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
file=audio_file,
model="google/medasr"
)
print(transcript.text)- Python 3.10+
- transformers >= 5.0.0
- torch >= 2.0.0
- CUDA-compatible GPU (optional, for faster inference)