[SW-238029] Fix max_batch_size handling - Lllama perf degradation fix by jiminha · Pull Request #1839 · HabanaAI/vllm-fork

jiminha · 2025-08-28T16:05:55Z

Llama Perf degradation seen with Gemma3 suport: #1616.

max_batch_size was initialized incorrectly for the profile_run due to mm_registry checking instead of actual multimodal models. Fix to only initialized to 1 when multimodal(mrope or mm_optimized) model is in use.

Llama v3.1 70B 2048/128 BF16 2xcard - perf drop 170 tps to 150 tps.
With this fix, it's back to 170tps

Llama Perf degradation seen with Gemma3 suport: #1616. : max_batch_size was initialized incorrectly for the profile_run due to mm_registry checking instead of actual multimodal models. Fix to only initialized to 1 when multimodal(mrope or mm_optimized) model is in use. Llama v3.1 70B 2048/128 BF16 2xcard - perf drop 170 tps to 150 tps. With this fix, it's back to 170tps --------- Co-authored-by: Iryna Boiko <iboiko@habana.ai>

xuechendi · 2025-08-28T16:13:26Z

/run-gaudi-tests

…#1839) Llama Perf degradation seen with Gemma3 suport: #1616. max_batch_size was initialized incorrectly for the profile_run due to mm_registry checking instead of actual multimodal models. Fix to only initialized to 1 when multimodal(mrope or mm_optimized) model is in use. Llama v3.1 70B 2048/128 BF16 2xcard - perf drop 170 tps to 150 tps. With this fix, it's back to 170tps Co-authored-by: Iryna Boiko <iboiko@habana.ai>

jiminha requested a review from xuechendi August 28, 2025 16:06

jiminha changed the base branch from main to habana_main August 28, 2025 16:06

jiminha requested review from PatrykWo, afierka-intel, deepvars, jikunshang, kzawora-intel, madamczyk-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and wpyszka as code owners August 28, 2025 16:06

michalkuligowski approved these changes Aug 28, 2025

View reviewed changes

bartekkuncer merged commit f7f76de into habana_main Aug 29, 2025
43 checks passed

bartekkuncer deleted the jha/gemmaconfigfix_main branch August 29, 2025 08:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SW-238029] Fix max_batch_size handling - Lllama perf degradation fix#1839

[SW-238029] Fix max_batch_size handling - Lllama perf degradation fix#1839
bartekkuncer merged 1 commit intohabana_mainfrom
jha/gemmaconfigfix_main

jiminha commented Aug 28, 2025 •

edited by github-actions Bot

Loading

Uh oh!

xuechendi commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jiminha commented Aug 28, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuechendi commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jiminha commented Aug 28, 2025 •

edited by github-actions Bot

Loading