Skip to content

Normalize router weights in MoE OP#72

Merged
kzawora-intel merged 1 commit intoHabanaAI:habana_mainfrom
jkaniecki:Normalise_router_weights
Jun 26, 2024
Merged

Normalize router weights in MoE OP#72
kzawora-intel merged 1 commit intoHabanaAI:habana_mainfrom
jkaniecki:Normalise_router_weights

Conversation

@jkaniecki
Copy link
Copy Markdown

Adds router weights normlization to improve mixtral accuracy.

Copy link
Copy Markdown

@szutenberg szutenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Please create PR also to habana_next branch. Thanks!

@szutenberg szutenberg requested a review from kzawora-intel June 26, 2024 07:24
@kzawora-intel kzawora-intel merged commit 2728599 into HabanaAI:habana_main Jun 26, 2024
michalkuligowski added a commit that referenced this pull request Jan 15, 2025
remove expert_max hard code (#47)
vLLM-Ext: Full enabling of ALiBi (#34)
Add version inference via setuptools-scm (#58)
Revert "vLLM-Ext: Full enabling of ALiBi (#34)" (#59)
Remove punica_hpu.py from vllm_hpu_extension (#66)
Removed previous (not-pipelined) pa implementation (#72)
Add flag to enable running softmax in fp32 (#71)
Update calibration readme link (#73)
allow lm_head quantization in calibration process (#65)
Pad to bmin if value is less (#67)
Update pyproject.toml (#75)

---------

Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
mfylcek added a commit that referenced this pull request Jan 21, 2025
remove expert_max hard code (#47)
vLLM-Ext: Full enabling of ALiBi (#34)
Add version inference via setuptools-scm (#58)
Revert "vLLM-Ext: Full enabling of ALiBi (#34)" (#59)
Remove punica_hpu.py from vllm_hpu_extension (#66)
Removed previous (not-pipelined) pa implementation (#72)
Add flag to enable running softmax in fp32 (#71)
Update calibration readme link (#73)
allow lm_head quantization in calibration process (#65)
Pad to bmin if value is less (#67)
Update pyproject.toml (#75)

---------

Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
ranzhejiang pushed a commit to ranzhejiang/vllm-fork that referenced this pull request Apr 11, 2025
Enable DeepseekV2 Lite/Chat models (HabanaAI#516)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants