Batching Support For Multiple Audios #1302

Mahmoud-ghareeb · 2025-05-20T22:53:29Z

Hello everyone,

This PR supports batching multiple audio files together for inference, based on the existing batching mechanism used within a single audio file.

Use case: This enables more efficient GPU utilization and higher throughput when performing inference on multiple small audio files, which would otherwise be processed sequentially and underutilize hardware resources.

Supports [audio_path, np.array, BinaryIO]

Example:

model = WhisperModel("tiny")
batched_model = BatchedInferencePipeline(model=model)

result, info = batched_model.transcribe_batch_multiple_audios(
    [physcisworks_path, physcisworks_path, physcisworks_path],
    batch_size=3
)

segments = []
for segment in result:
    segments.append(
        {"text": segment.text}
    )

…fferenciate between batching inside audio and multible audios

MahmoudAshraf97 · 2025-05-22T07:55:13Z

Hello Mahmoud, the idea itself sounds good and the current implementation works but I have a few concerns, the new function is basically a duplicated transcribe function which makes maintenance even harder since we already have two transcribe functions, what I would suggest to achieve the same functionality without any code changes is to load all the files and concatenate them and pass them to transcribe function with clip_timestamps and vad_filter=False, for example:

offset = 0
clip_timestamps = []
audio = np.array([])
for audio_path in audio_pathes:
    clip = decode_audio(audio_path)
    clip_timestamps.append(
        {"start": offset / 16000, "end": (offset + len(clip)) / 16000}
    )
    audio = np.concatenate((audio, clip))
    offset += len(clip)

I'm also open to discussion if you have a better idea

Mahmoud-ghareeb · 2025-05-22T11:58:46Z

Hi Mahmoud,

Thanks for your detailed response

Yes I thought about it, but it was easier for me to duplicate because I had a task, and I wanted to do it as soon as possible and I published the branch for anyone else to use.

But I will redo it to be cleaner, and I will be pleased to continue the discussion with you about it

Mahmoud-ghareeb · 2025-05-22T12:01:16Z

I also thought of doing a hybrid approach => multiple audios, multiple batches

MahmoudAshraf97 · 2025-05-22T12:31:46Z

it can be generalized to allow for audios that are not necessarily shorter than 30s, you'll segment the audio using VAD and concatenate all segments across the batch dimension and continue as if it was a single audio and split after transcription, this is beneficial for very large batch sizes where the last batch will not be fully occupied

Mahmoud-ghareeb · 2025-05-22T12:39:47Z

yeah thats thats what i mean with hybrid approach

I am working on it now

Nixoals · 2025-06-25T07:19:55Z

Hi, it would be awesome to support this feature.

egrinstein · 2025-08-06T14:59:01Z

Hi everyone, thank you for the discussion.
@MahmoudAshraf97 , wouldn't your concatenation approach's complexity grow with the sum of the duration of all files? Because the transformer contexts would grow and grow, right?

Also, why is using clip_timestamps necessary?

Thank you and best regards

MahmoudAshraf97 · 2025-08-06T15:05:54Z

Hi everyone, thank you for the discussion. @MahmoudAshraf97 , wouldn't your concatenation approach's complexity grow with the sum of the duration of all files? Because the transformer contexts would grow and grow, right?

Also, why is using clip_timestamps necessary?

Thank you and best regards

not at all, this approach is valid for files less than 30s only, and by using clip timestamps we can separate the different files from each other and also process them in parallel, independently from each other

j-silv · 2025-09-05T19:28:51Z

Hi! It looks like this PR has stalled. I’d like to pick it up and continue implementing batching support for multiple audios. Would it be okay if I open a new PR referencing this one? I would continue where @Mahmoud-ghareeb left off.

For my use case, I have a bunch of audio files which are < 30 s. I want to batch transcribe them on the GPU, but I still want to run VAD on each individual audio file in parallel.

I know that there are already whisper models that exist for multiple audio batching, but I'd much rather use this repo for its speed, the fact that VAD is available, and because the source code is much simpler to understand and modify (as compared to HF's many layers of abstraction).

MahmoudAshraf97 · 2025-09-05T19:58:11Z

Yes feel free to open a PR that continues this one

This work continues where SYSTRAN#1302 left off. The goal is to transcribe multiple audio files truly in parallel and increase GPU throughput.

This work continues where SYSTRAN#1302 left off. The goal is to transcribe multiple audio files truly in parallel and increase GPU throughput. For more information please refer to the pull request

…nal transcribe function.

Mahmoud-ghareeb · 2025-12-02T15:15:03Z

Extended BatchedInferencePipeline.transcribe() to accept a list of multiple audio inputs, enabling batch transcription of multiple audio files in a single call with GPU-parallel inference.
it works with any audio duration even > 30 sec

Example

model = WhisperModel("tiny")
batched_model = BatchedInferencePipeline(model=model)

Single audio (unchanged)
segments, info = batched_model.transcribe(audio)

Multiple audios (new)
segments, info = batched_model.transcribe([audio1, audio2, audio3])

Deprecation

Added deprecation warning to transcribe_batch_multiple_audios() - users should migrate to transcribe() with a list.

Testing

Added test_transcribe_multiple_audios
Added test_transcribe_multiple_audios_with_word_timestamps

@MahmoudAshraf97 i also merged the latest changes so if everything is ok, merge it please.

Mahmoud-ghareeb added 10 commits May 21, 2025 00:56

adding batch support for multible audios

f5c9658

added test case for batch inference on multible audios

2f24884

renamed the function to transcribe_batch_multible_audios so we can di…

9407371

…fferenciate between batching inside audio and multible audios

fixing typo in function name

a5ae59c

Batching Support For Multiple Audios

79ef644

fixed code style issues

02e31f3

fixed black issues

5efa67b

Fix formatting issues in transcribe.py

1ab7716

Fix formatting issues in transcribe.py

e4c611d

removed added files by mistake

ef45a17

Mahmoud-ghareeb added 6 commits May 26, 2025 10:01

returning text only if requested

22c2ec2

their is an error in the tests in transcribe method

3d7fdf3

quick changes

5215ce8

fix tests

c6fab89

add text_only to the batched version

52cd0d5

Optimized version that can return text only results

f1b8309

Mahmoud-ghareeb force-pushed the master branch from 458cdaf to f1b8309 Compare May 27, 2025 23:38

j-silv mentioned this pull request Sep 8, 2025

Work in progress for batching with multiple audio files [WIP] #1359

Open

Mahmoud-ghareeb added 8 commits December 2, 2025 13:18

Adding support for transcribing multiple audio files within the origi…

6053d75

…nal transcribe function.

deleting unneeded files

c81e24b

fixing tests

e9ebbcd

fixing confilects

a5f316c

fixing confilects and update the branch to the latest changes

b476c1c

fixing style

f2de9e3

fix batch audio return type

dcc7c4a

return type noe the same for both batchand single audio

ef1b692

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batching Support For Multiple Audios #1302

Batching Support For Multiple Audios #1302

Mahmoud-ghareeb commented May 20, 2025 •

edited

Loading

Uh oh!

MahmoudAshraf97 commented May 22, 2025

Uh oh!

Mahmoud-ghareeb commented May 22, 2025

Uh oh!

Mahmoud-ghareeb commented May 22, 2025

Uh oh!

MahmoudAshraf97 commented May 22, 2025

Uh oh!

Mahmoud-ghareeb commented May 22, 2025

Uh oh!

Nixoals commented Jun 25, 2025

Uh oh!

egrinstein commented Aug 6, 2025

Uh oh!

MahmoudAshraf97 commented Aug 6, 2025

Uh oh!

j-silv commented Sep 5, 2025

Uh oh!

MahmoudAshraf97 commented Sep 5, 2025

Uh oh!

Mahmoud-ghareeb commented Dec 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Batching Support For Multiple Audios #1302

Are you sure you want to change the base?

Batching Support For Multiple Audios #1302

Conversation

Mahmoud-ghareeb commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MahmoudAshraf97 commented May 22, 2025

Uh oh!

Mahmoud-ghareeb commented May 22, 2025

Uh oh!

Mahmoud-ghareeb commented May 22, 2025

Uh oh!

MahmoudAshraf97 commented May 22, 2025

Uh oh!

Mahmoud-ghareeb commented May 22, 2025

Uh oh!

Nixoals commented Jun 25, 2025

Uh oh!

egrinstein commented Aug 6, 2025

Uh oh!

MahmoudAshraf97 commented Aug 6, 2025

Uh oh!

j-silv commented Sep 5, 2025

Uh oh!

MahmoudAshraf97 commented Sep 5, 2025

Uh oh!

Mahmoud-ghareeb commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example

Deprecation

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Mahmoud-ghareeb commented May 20, 2025 •

edited

Loading

Mahmoud-ghareeb commented Dec 2, 2025 •

edited

Loading