Skip to content

[SW-238029] Fix max_batch_size handling - Lllama perf degradation fix#1839

Merged
bartekkuncer merged 1 commit intohabana_mainfrom
jha/gemmaconfigfix_main
Aug 29, 2025
Merged

[SW-238029] Fix max_batch_size handling - Lllama perf degradation fix#1839
bartekkuncer merged 1 commit intohabana_mainfrom
jha/gemmaconfigfix_main

Conversation

@jiminha
Copy link
Copy Markdown

@jiminha jiminha commented Aug 28, 2025

Llama Perf degradation seen with Gemma3 suport: #1616.

max_batch_size was initialized incorrectly for the profile_run due to mm_registry checking instead of actual multimodal models. Fix to only initialized to 1 when multimodal(mrope or mm_optimized) model is in use.

Llama v3.1 70B 2048/128 BF16 2xcard - perf drop 170 tps to 150 tps.
With this fix, it's back to 170tps

Llama Perf degradation seen with Gemma3 suport:
#1616.

: max_batch_size was initialized incorrectly for the profile_run due to
mm_registry checking instead of actual multimodal models. Fix to only
initialized to 1 when multimodal(mrope or mm_optimized) model is in use.

Llama v3.1 70B 2048/128 BF16 2xcard - perf drop 170 tps to 150 tps.
With this fix, it's back to 170tps

---------

Co-authored-by: Iryna Boiko <iboiko@habana.ai>
@xuechendi
Copy link
Copy Markdown

/run-gaudi-tests

@bartekkuncer bartekkuncer merged commit f7f76de into habana_main Aug 29, 2025
43 checks passed
@bartekkuncer bartekkuncer deleted the jha/gemmaconfigfix_main branch August 29, 2025 08:06
shepark pushed a commit that referenced this pull request Aug 31, 2025
…#1839)

Llama Perf degradation seen with Gemma3 suport:
#1616.

max_batch_size was initialized incorrectly for the profile_run due to
mm_registry checking instead of actual multimodal models. Fix to only
initialized to 1 when multimodal(mrope or mm_optimized) model is in use.

Llama v3.1 70B 2048/128 BF16 2xcard - perf drop 170 tps to 150 tps.
With this fix, it's back to 170tps

Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants