[BugFix] Fix qwen3omni thinker batching. #207

yinpeiqi · 2025-12-05T06:28:24Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

When running multiple prompts, the system raise error if the Qwen3Omni Thinker stage use a batch size > 1.

ERROR:vllm_omni.entrypoints.omni_llm:[Orchestrator] Process engine inputs error for req 0 at stage 1: 'NoneType' object has no attribute 'float'                
Traceback (most recent call last):                                                                                                                              
  File "/workspace/test/vllm-omni/vllm_omni/entrypoints/omni_llm.py", line 380, in _run_generation                                                              
    next_inputs = next_stage.process_engine_inputs(self.stage_list, [request_id_to_prompt[req_id]])                                                             
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                             
  File "/workspace/test/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 319, in process_engine_inputs                                                      
    return self.custom_process_input_func(                                                                                                                      
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                      
  File "/workspace/test/vllm-omni/vllm_omni/model_executor/stage_input_processors/qwen3_omni.py", line 98, in thinker2talker                                    
    "tts_bos_embed": output.multimodal_output.get("tts_bos_embed").float().clone().detach().cuda(),                                                             
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                        
AttributeError: 'NoneType' object has no attribute 'float'

The reason is, in qwen3 thinker, the program record the tts_bos_embed as a tensor

                    multimodal_outputs["tts_bos_embed"] = bos_eos_pad[0]
                    multimodal_outputs["tts_eos_embed"] = bos_eos_pad[1]
                    multimodal_outputs["tts_pad_embed"] = bos_eos_pad[2]

While for the postprocess for multimudal outputs, the program goes into case 1:

                        # Case 1: tensor aligned on token dimension
                        if isinstance(v, torch.Tensor) and v.shape[0] == hidden_states_cpu.shape[0]:
                            mm_payload[k] = v.detach().to("cpu")[prev_logits_index : logits_index + 1].contiguous()

However, if the batch size > 1, the program will ignore Case 1, and don't do anything then leave the process.

In my fix plan, I propose:

change the record of tts_bos_embed to a list, then the program can go into Case 3.

multimodal_outputs["tts_bos_embed"] = [bos_eos_pad[0]]

add a error detection for extract multimodal output (after Case 1), in order that the mm output won't be bypass
in Case 3, offload the tensor to CPU and detach.

@tzhouam @hsliuustc0106

Test Plan

Configures: set batch size to 3 for both thinker and talker.

Query:

We asked over twenty different people, and they all said it was his.
I'm never more aware of a room's acoustics than when I'm trying to enjoy a snack I have no intention of sharing.
Sometimes I overthink things which leads me to postpone and ultimately never achieve the goal I had in mind.

Run bash:

python end2end.py --output-wav output_audio \
                  --query-type text \
                  --txt-prompts top3.txt

Test Result

INFO:vllm_omni.entrypoints.omni_llm:[Orchestrator] Stage-0 reported ready                                                                                      
INFO:vllm_omni.entrypoints.omni_llm:[Orchestrator] All stages initialized successfully                                                                         
[Info] Loaded 3 prompts from top3.txt                                                                                                                          
[Stage-0] Max batch size: 3                                                                                                                                    
--------------------------------                                                                                                                               
[Stage-0] Received batch size=3, request_ids=[0, 1, 2]                                                                                                         
--------------------------------                                                                                                                               
(Worker_TP1 pid=1320610) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!                                                              
(Worker_TP0 pid=1320029) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!                                                              
(Worker_TP1 pid=1320610) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!                                                              
(Worker_TP0 pid=1320029) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!                                                              
(Worker_TP1 pid=1320610) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!                                                              
(Worker_TP0 pid=1320029) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!                                                              
[Stage-0] Generate done: batch=3, req_ids=[0, 1, 2], gen_ms=19090.2                                                                                            
[Stage-1] Max batch size: 3                                                                                                                                    
--------------------------------                                                                                                                               
[Stage-1] Received batch size=3, request_ids=[0, 1, 2]                                                                                                         
--------------------------------                                                                                                                               
...
(Worker pid=1321452) INFO 12-05 13:23:20 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker pid=1321452) INFO 12-05 13:23:20 [multiproc_executor.py:599] WorkerProc shutting down.
(Worker pid=1320719) INFO 12-05 13:23:20 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP0 pid=1320029) INFO 12-05 13:23:20 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP0 pid=1320029) INFO 12-05 13:23:20 [multiproc_executor.py:599] WorkerProc shutting down.
(Worker_TP1 pid=1320610) INFO 12-05 13:23:20 [multiproc_executor.py:558] Parent process exited, terminating worker
Request ID: 0, Text saved to output_audio/00000.txt
Request ID: 1, Text saved to output_audio/00001.txt
Request ID: 2, Text saved to output_audio/00002.txt
Request ID: 0, Saved audio to output_audio/output_0.wav
Request ID: 1, Saved audio to output_audio/output_1.wav
Request ID: 2, Saved audio to output_audio/output_2.wav

Output result: (in .zip)

output_audio.zip

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector · 2025-12-05T06:28:29Z

The account who enabled Codex for this repo no longer has access to Codex. Please contact the admins of this repo to enable Codex again.

Signed-off-by: yinpeiqi <[email protected]>

Gaohan123

Thanks for your contribution. Please supplement test plan and test result.

yinpeiqi · 2025-12-05T12:59:00Z

Thanks for your contribution. Please supplement test plan and test result.

Done.

Gaohan123

LGTM. A good catch!

hsliuustc0106

LGTM, please add test result

yinpeiqi · 2025-12-08T02:28:07Z

LGTM, please add test result

I included a test result (zip file, include output.wav and text txt for three queries) in the pr description. Do I need to to add more result here?

yinpeiqi requested a review from hsliuustc0106 as a code owner December 5, 2025 06:28

yinpeiqi added 2 commits December 5, 2025 14:38

fix qwen3omni thinker batching

fa6ab51

Signed-off-by: yinpeiqi <[email protected]>

update

593a352

Signed-off-by: yinpeiqi <[email protected]>

yinpeiqi force-pushed the fix-thinker-batch branch from 42bd775 to 593a352 Compare December 5, 2025 07:38

Gaohan123 reviewed Dec 5, 2025

View reviewed changes

Gaohan123 approved these changes Dec 7, 2025

View reviewed changes

Merge branch 'main' into fix-thinker-batch

0bd5f32

hsliuustc0106 approved these changes Dec 8, 2025

View reviewed changes

Gaohan123 added 2 commits December 8, 2025 11:27

Merge branch 'main' into fix-thinker-batch

3ba7a9d

Merge branch 'main' into fix-thinker-batch

cb9a732

Gaohan123 enabled auto-merge (squash) December 8, 2025 04:08

Gaohan123 merged commit bbb5a72 into vllm-project:main Dec 8, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix qwen3omni thinker batching. #207

[BugFix] Fix qwen3omni thinker batching. #207

Uh oh!

yinpeiqi commented Dec 5, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot commented Dec 5, 2025

Uh oh!

Gaohan123 left a comment

Uh oh!

yinpeiqi commented Dec 5, 2025

Uh oh!

Gaohan123 left a comment

Uh oh!

hsliuustc0106 left a comment

Uh oh!

yinpeiqi commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[BugFix] Fix qwen3omni thinker batching. #207

[BugFix] Fix qwen3omni thinker batching. #207

Uh oh!

Conversation

yinpeiqi commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot commented Dec 5, 2025

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

yinpeiqi commented Dec 5, 2025

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

yinpeiqi commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yinpeiqi commented Dec 5, 2025 •

edited

Loading