Create a separate sherpa-voice benchmark page for recorder-like transcription experiments so multiple ASR models can be compared without changing the existing feature demos.
- New benchmark page exists under the
sherpa-voicefeature stack. - The page supports repeatable
sample filebenchmarks across multiple ASR models. - The page supports
live micbenchmarks for streaming models with latency-oriented metrics. - Benchmark results are kept in-session and can be copied/exported.
- The RN ASR config surface supports model-specific fields needed for fair comparisons instead of hardcoded Whisper/SenseVoice defaults.
- Android-first
- On-device-first
- English-first
- Transcription-first
streaming-zipformer-en-20m-mobilestreaming-zipformer-en-generalstreaming-zipformer-ctc-small-2024-03-18streaming-zipformer-bilingual-zh-en-2023-02-20streaming-paraformer-bilingual-zh-enstreaming-zipformer-en-kroko-2025-08-06whisper-tiny-enwhisper-small-multilingualsense-voice-zh-en-ja-ko-yue-int8-2025-09-09nemo-canary-180m-flash-en-es-de-fr
- model id
- benchmark mode (
sampleorlive) - init time
- run duration
- first partial latency
- first commit latency
- trailing commit latency after last audio chunk
- transcript output
- error state
- Translation stays experimental in this PR; the first benchmark focus is Recorder-like transcription responsiveness.
- Upstream Sherpa freshness should be tracked separately from any claim about rebuilt native binaries.
- The benchmark matrix intentionally includes non-winner baselines so recommendations are based on tradeoffs, not only on the strongest model.
packages/sherpa-onnx.rnis part of the scope: package-level hardcoded ASR defaults were removed where needed for fair benchmarking, and remaining bridge/runtime gaps are documented indocs/ASR_BENCHMARK.md.- Current benchmark decision:
streaming-zipformer-bilingual-zh-en-2023-02-20is the best live model in the tested Sherpa set on the Pixel 6a, but it is still far from Google Recorder-like performance.