[DeepseekR1] porting 1.21 deepseek changes to main (#137) by xuechendi · Pull Request #161 · HabanaAI/vllm-hpu-extension

xuechendi · 2025-04-22T21:44:12Z

This PR is to port #137 to main, so we can leverage that to work on habana_main for deepseek_r1 enabling

* Remove redundant cast before index_reduce (#134) * add ops for deepseek_r1 Signed-off-by: Chendi.Xue <chendi.xue@intel.com> * Remove DynamicFusedMOE Signed-off-by: Chendi Xue <chendi.xue@intel.com> --------- Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>

michalkuligowski · 2025-04-29T10:46:06Z

Related vllm-fork PR HabanaAI/vllm-fork#1161
@xuechendi can we merge in pair with #1161 after tests pass with updated requirements pointing to this PR?

)" This reverts commit c487a21.

)" (#169) This reverts commit c487a21.

JIRA: https://jira.habana-labs.com/browse/SW-227174 cherry-pick #1030 and fixed conflicts after rebase Dependency: HabanaAI/vllm-hpu-extension#161 Verified with below 3 methods: 1. test with deepseek-v2 BF16 weight. => Passed 2. evaluate acc on deepseek-r1 with out of box block fp8 weight => Passed 3. evaluate acc on deepseek-r1 with out of box block fp8 weight + INC calibrated per-channel scale => Passed acc check, performance reach goal(number is in jira ticket) == Details == 1. test with deepseek-v2 BF16 weight: ``` PT_HPU_LAZY_MODE=1 python run_example_tp.py --model DeepSeek-V2-Lite --tokenizer DeepSeek-V2-Lite --osl 32 ``` ``` (VllmWorkerProcess pid=1039) WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up! (VllmWorkerProcess pid=1038) WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up! (VllmWorkerProcess pid=1041) WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up! WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up! Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.57it/s, est. speed input: 12.59 toks/s, output: 50.37 toks/s] e2e took 2.5509743690199684 seconds ==================================== Prompt: 'Hello, my name is' Generated text: '\nI am a 20 year old student from the UK. I am currently studying for a degree in English Literature and Creative Writing at the University of East' Ground truth: None ==================================== ==================================== Prompt: '0.999 compares to 0.9 is ' Generated text: '100%\n0.9999999999999999999999999' Ground truth: None ==================================== ==================================== Prompt: 'The capital of France is' Generated text: ' Paris, which is also the largest city in the country. The city is located on the Seine River and is known for its beautiful architecture, museums, and art' Ground truth: None ==================================== ==================================== Prompt: 'The future of AI is' Generated text: ' in the hands of the people\nThe future of AI is in the hands of the people\nThe future of AI is in the hands of the people\nThe' Ground truth: None ==================================== ``` 2. evaluate acc on deepseek-r1 with out of box block fp8 weight - limit 256 |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.9648|± |0.0115| | | |strict-match | 5|exact_match|↑ |0.9648|± |0.0115| 3. evaluate acc on deepseek-r1 with out of box block fp8 weight + INC calibrated per-channel scale |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.9688|± |0.0109| | | |strict-match | 5|exact_match|↑ |0.9688|± |0.0109| --------- Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Signed-off-by: kwisniewski98 <kwisniewski@habana.ai> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: kwisniewski98 <kwisniewski@habana.ai> Co-authored-by: Youlei Yang <youlei.yang@intel.com> Co-authored-by: Yi Liu <yi4.liu@intel.com> Co-authored-by: Yi Liu <yiliu4@habana.ai>

xuechendi requested review from afierka-intel, kzawora-intel, madamczyk-intel, mgawarkiewicz-intel, michalkuligowski and tzielinski-habana as code owners April 22, 2025 21:44

xuechendi mentioned this pull request Apr 24, 2025

[Deepseek R1][v0] Porting deepseek r1 to habana_main HabanaAI/vllm-fork#1161

Merged

michalkuligowski approved these changes Apr 29, 2025

View reviewed changes

xuechendi merged commit c487a21 into main May 2, 2025

madamczyk-intel added a commit that referenced this pull request May 7, 2025

Revert "[DeepseekR1] 1.21 release support necessary changes (#137) (#161

8f33e11

)" This reverts commit c487a21.

madamczyk-intel mentioned this pull request May 7, 2025

Revert "[DeepseekR1] porting 1.21 deepseek changes to main (#137)" #169

Merged

afierka-intel pushed a commit that referenced this pull request May 7, 2025

Revert "[DeepseekR1] 1.21 release support necessary changes (#137) (#161

7ca2ed3

)" (#169) This reverts commit c487a21.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DeepseekR1] porting 1.21 deepseek changes to main (#137)#161

[DeepseekR1] porting 1.21 deepseek changes to main (#137)#161
xuechendi merged 1 commit into
mainfrom
dev/1.21_deepseek

xuechendi commented Apr 22, 2025

Uh oh!

michalkuligowski commented Apr 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xuechendi commented Apr 22, 2025

Uh oh!

michalkuligowski commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

michalkuligowski commented Apr 29, 2025 •

edited

Loading