Skip to content

Add mark steps to prevent OOM in static moe op#65

Merged
kzawora-intel merged 1 commit intoHabanaAI:habana_mainfrom
jkaniecki:Fix_mixtral_oom_with_higher_bs
Jun 24, 2024
Merged

Add mark steps to prevent OOM in static moe op#65
kzawora-intel merged 1 commit intoHabanaAI:habana_mainfrom
jkaniecki:Fix_mixtral_oom_with_higher_bs

Conversation

@jkaniecki
Copy link
Copy Markdown

Adding mark steps inside static MoE op to prevent OOMs when using higher bs values

@szutenberg szutenberg requested a review from kzawora-intel June 21, 2024 08:37
@kzawora-intel kzawora-intel merged commit 11f047c into HabanaAI:habana_main Jun 24, 2024
michalkuligowski added a commit that referenced this pull request Jan 15, 2025
remove expert_max hard code (#47)
vLLM-Ext: Full enabling of ALiBi (#34)
Add version inference via setuptools-scm (#58)
Revert "vLLM-Ext: Full enabling of ALiBi (#34)" (#59)
Remove punica_hpu.py from vllm_hpu_extension (#66)
Removed previous (not-pipelined) pa implementation (#72)
Add flag to enable running softmax in fp32 (#71)
Update calibration readme link (#73)
allow lm_head quantization in calibration process (#65)
Pad to bmin if value is less (#67)
Update pyproject.toml (#75)

---------

Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
mfylcek added a commit that referenced this pull request Jan 21, 2025
remove expert_max hard code (#47)
vLLM-Ext: Full enabling of ALiBi (#34)
Add version inference via setuptools-scm (#58)
Revert "vLLM-Ext: Full enabling of ALiBi (#34)" (#59)
Remove punica_hpu.py from vllm_hpu_extension (#66)
Removed previous (not-pipelined) pa implementation (#72)
Add flag to enable running softmax in fp32 (#71)
Update calibration readme link (#73)
allow lm_head quantization in calibration process (#65)
Pad to bmin if value is less (#67)
Update pyproject.toml (#75)

---------

Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants