Skip to content

Conversation

@yinpeiqi
Copy link
Contributor

@yinpeiqi yinpeiqi commented Dec 5, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

When running multiple prompts, the system raise error if the Qwen3Omni Thinker stage use a batch size > 1.

ERROR:vllm_omni.entrypoints.omni_llm:[Orchestrator] Process engine inputs error for req 0 at stage 1: 'NoneType' object has no attribute 'float'                
Traceback (most recent call last):                                                                                                                              
  File "/workspace/test/vllm-omni/vllm_omni/entrypoints/omni_llm.py", line 380, in _run_generation                                                              
    next_inputs = next_stage.process_engine_inputs(self.stage_list, [request_id_to_prompt[req_id]])                                                             
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                             
  File "/workspace/test/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 319, in process_engine_inputs                                                      
    return self.custom_process_input_func(                                                                                                                      
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                      
  File "/workspace/test/vllm-omni/vllm_omni/model_executor/stage_input_processors/qwen3_omni.py", line 98, in thinker2talker                                    
    "tts_bos_embed": output.multimodal_output.get("tts_bos_embed").float().clone().detach().cuda(),                                                             
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                        
AttributeError: 'NoneType' object has no attribute 'float'                                                                                                      

The reason is, in qwen3 thinker, the program record the tts_bos_embed as a tensor

                    multimodal_outputs["tts_bos_embed"] = bos_eos_pad[0]
                    multimodal_outputs["tts_eos_embed"] = bos_eos_pad[1]
                    multimodal_outputs["tts_pad_embed"] = bos_eos_pad[2]

While for the postprocess for multimudal outputs, the program goes into case 1:

                        # Case 1: tensor aligned on token dimension
                        if isinstance(v, torch.Tensor) and v.shape[0] == hidden_states_cpu.shape[0]:
                            mm_payload[k] = v.detach().to("cpu")[prev_logits_index : logits_index + 1].contiguous()

However, if the batch size > 1, the program will ignore Case 1, and don't do anything then leave the process.

In my fix plan, I propose:

  1. change the record of tts_bos_embed to a list, then the program can go into Case 3.
multimodal_outputs["tts_bos_embed"] = [bos_eos_pad[0]]
  1. add a error detection for extract multimodal output (after Case 1), in order that the mm output won't be bypass
  2. in Case 3, offload the tensor to CPU and detach.

@tzhouam @hsliuustc0106

Test Plan

Configures: set batch size to 3 for both thinker and talker.

Query:

We asked over twenty different people, and they all said it was his.
I'm never more aware of a room's acoustics than when I'm trying to enjoy a snack I have no intention of sharing.
Sometimes I overthink things which leads me to postpone and ultimately never achieve the goal I had in mind.

Run bash:

python end2end.py --output-wav output_audio \
                  --query-type text \
                  --txt-prompts top3.txt

Test Result

INFO:vllm_omni.entrypoints.omni_llm:[Orchestrator] Stage-0 reported ready                                                                                      
INFO:vllm_omni.entrypoints.omni_llm:[Orchestrator] All stages initialized successfully                                                                         
[Info] Loaded 3 prompts from top3.txt                                                                                                                          
[Stage-0] Max batch size: 3                                                                                                                                    
--------------------------------                                                                                                                               
[Stage-0] Received batch size=3, request_ids=[0, 1, 2]                                                                                                         
--------------------------------                                                                                                                               
(Worker_TP1 pid=1320610) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!                                                              
(Worker_TP0 pid=1320029) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!                                                              
(Worker_TP1 pid=1320610) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!                                                              
(Worker_TP0 pid=1320029) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!                                                              
(Worker_TP1 pid=1320610) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!                                                              
(Worker_TP0 pid=1320029) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!                                                              
[Stage-0] Generate done: batch=3, req_ids=[0, 1, 2], gen_ms=19090.2                                                                                            
[Stage-1] Max batch size: 3                                                                                                                                    
--------------------------------                                                                                                                               
[Stage-1] Received batch size=3, request_ids=[0, 1, 2]                                                                                                         
--------------------------------                                                                                                                               
...
(Worker pid=1321452) INFO 12-05 13:23:20 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker pid=1321452) INFO 12-05 13:23:20 [multiproc_executor.py:599] WorkerProc shutting down.
(Worker pid=1320719) INFO 12-05 13:23:20 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP0 pid=1320029) INFO 12-05 13:23:20 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP0 pid=1320029) INFO 12-05 13:23:20 [multiproc_executor.py:599] WorkerProc shutting down.
(Worker_TP1 pid=1320610) INFO 12-05 13:23:20 [multiproc_executor.py:558] Parent process exited, terminating worker
Request ID: 0, Text saved to output_audio/00000.txt
Request ID: 1, Text saved to output_audio/00001.txt
Request ID: 2, Text saved to output_audio/00002.txt
Request ID: 0, Saved audio to output_audio/output_0.wav
Request ID: 1, Saved audio to output_audio/output_1.wav
Request ID: 2, Saved audio to output_audio/output_2.wav

Output result: (in .zip)

output_audio.zip


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@chatgpt-codex-connector
Copy link

The account who enabled Codex for this repo no longer has access to Codex. Please contact the admins of this repo to enable Codex again.

Signed-off-by: yinpeiqi <[email protected]>
Copy link
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. Please supplement test plan and test result.

@yinpeiqi
Copy link
Contributor Author

yinpeiqi commented Dec 5, 2025

Thanks for your contribution. Please supplement test plan and test result.

Done.

Copy link
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A good catch!

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please add test result

@yinpeiqi
Copy link
Contributor Author

yinpeiqi commented Dec 8, 2025

LGTM, please add test result

I included a test result (zip file, include output.wav and text txt for three queries) in the pr description. Do I need to to add more result here?

@Gaohan123 Gaohan123 enabled auto-merge (squash) December 8, 2025 04:08
@Gaohan123 Gaohan123 merged commit bbb5a72 into vllm-project:main Dec 8, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants