[DeepseekR1] porting 1.21 deepseek changes to main (#137)#161
Merged
Conversation
* Remove redundant cast before index_reduce (#134) * add ops for deepseek_r1 Signed-off-by: Chendi.Xue <chendi.xue@intel.com> * Remove DynamicFusedMOE Signed-off-by: Chendi Xue <chendi.xue@intel.com> --------- Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Contributor
|
Related vllm-fork PR HabanaAI/vllm-fork#1161 |
michalkuligowski
approved these changes
Apr 29, 2025
madamczyk-intel
added a commit
that referenced
this pull request
May 7, 2025
afierka-intel
pushed a commit
that referenced
this pull request
May 7, 2025
xuechendi
added a commit
to HabanaAI/vllm-fork
that referenced
this pull request
May 8, 2025
JIRA: https://jira.habana-labs.com/browse/SW-227174 cherry-pick #1030 and fixed conflicts after rebase Dependency: HabanaAI/vllm-hpu-extension#161 Verified with below 3 methods: 1. test with deepseek-v2 BF16 weight. => Passed 2. evaluate acc on deepseek-r1 with out of box block fp8 weight => Passed 3. evaluate acc on deepseek-r1 with out of box block fp8 weight + INC calibrated per-channel scale => Passed acc check, performance reach goal(number is in jira ticket) == Details == 1. test with deepseek-v2 BF16 weight: ``` PT_HPU_LAZY_MODE=1 python run_example_tp.py --model DeepSeek-V2-Lite --tokenizer DeepSeek-V2-Lite --osl 32 ``` ``` (VllmWorkerProcess pid=1039) WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up! (VllmWorkerProcess pid=1038) WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up! (VllmWorkerProcess pid=1041) WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up! WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up! Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.57it/s, est. speed input: 12.59 toks/s, output: 50.37 toks/s] e2e took 2.5509743690199684 seconds ==================================== Prompt: 'Hello, my name is' Generated text: '\nI am a 20 year old student from the UK. I am currently studying for a degree in English Literature and Creative Writing at the University of East' Ground truth: None ==================================== ==================================== Prompt: '0.999 compares to 0.9 is ' Generated text: '100%\n0.9999999999999999999999999' Ground truth: None ==================================== ==================================== Prompt: 'The capital of France is' Generated text: ' Paris, which is also the largest city in the country. The city is located on the Seine River and is known for its beautiful architecture, museums, and art' Ground truth: None ==================================== ==================================== Prompt: 'The future of AI is' Generated text: ' in the hands of the people\nThe future of AI is in the hands of the people\nThe future of AI is in the hands of the people\nThe' Ground truth: None ==================================== ``` 2. evaluate acc on deepseek-r1 with out of box block fp8 weight - limit 256 |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.9648|± |0.0115| | | |strict-match | 5|exact_match|↑ |0.9648|± |0.0115| 3. evaluate acc on deepseek-r1 with out of box block fp8 weight + INC calibrated per-channel scale |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.9688|± |0.0109| | | |strict-match | 5|exact_match|↑ |0.9688|± |0.0109| --------- Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Signed-off-by: kwisniewski98 <kwisniewski@habana.ai> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: kwisniewski98 <kwisniewski@habana.ai> Co-authored-by: Youlei Yang <youlei.yang@intel.com> Co-authored-by: Yi Liu <yi4.liu@intel.com> Co-authored-by: Yi Liu <yiliu4@habana.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is to port #137 to main, so we can leverage that to work on habana_main for deepseek_r1 enabling