Skip to content

[DeepseekR1] porting 1.21 deepseek changes to main (#137)#161

Merged
xuechendi merged 1 commit into
mainfrom
dev/1.21_deepseek
May 2, 2025
Merged

[DeepseekR1] porting 1.21 deepseek changes to main (#137)#161
xuechendi merged 1 commit into
mainfrom
dev/1.21_deepseek

Conversation

@xuechendi
Copy link
Copy Markdown
Contributor

This PR is to port #137 to main, so we can leverage that to work on habana_main for deepseek_r1 enabling

* Remove redundant cast before index_reduce (#134)

* add ops for deepseek_r1

Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

* Remove DynamicFusedMOE

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

---------

Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
@michalkuligowski
Copy link
Copy Markdown
Contributor

michalkuligowski commented Apr 29, 2025

Related vllm-fork PR HabanaAI/vllm-fork#1161
@xuechendi can we merge in pair with #1161 after tests pass with updated requirements pointing to this PR?

@xuechendi xuechendi merged commit c487a21 into main May 2, 2025
madamczyk-intel added a commit that referenced this pull request May 7, 2025
afierka-intel pushed a commit that referenced this pull request May 7, 2025
xuechendi added a commit to HabanaAI/vllm-fork that referenced this pull request May 8, 2025
JIRA: https://jira.habana-labs.com/browse/SW-227174

cherry-pick #1030 and fixed conflicts after rebase
Dependency: HabanaAI/vllm-hpu-extension#161

Verified with below 3 methods:

1. test with deepseek-v2 BF16 weight. => Passed
2. evaluate acc on deepseek-r1 with out of box block fp8 weight =>
Passed
3. evaluate acc on deepseek-r1 with out of box block fp8 weight + INC
calibrated per-channel scale => Passed acc check, performance reach
goal(number is in jira ticket)

== Details ==

1. test with deepseek-v2 BF16 weight:
```
PT_HPU_LAZY_MODE=1 python run_example_tp.py --model DeepSeek-V2-Lite --tokenizer DeepSeek-V2-Lite --osl 32 
```
```
(VllmWorkerProcess pid=1039) WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up!
(VllmWorkerProcess pid=1038) WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up!
(VllmWorkerProcess pid=1041) WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up!
WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up!
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.57it/s, est. speed input: 12.59 toks/s, output: 50.37 toks/s]
e2e took 2.5509743690199684 seconds
====================================
Prompt: 'Hello, my name is'
Generated text: '\nI am a 20 year old student from the UK. I am currently studying for a degree in English Literature and Creative Writing at the University of East'
Ground truth: None
====================================
====================================
Prompt: '0.999 compares to 0.9 is '
Generated text: '100%\n0.9999999999999999999999999'
Ground truth: None
====================================
====================================
Prompt: 'The capital of France is'
Generated text: ' Paris, which is also the largest city in the country. The city is located on the Seine River and is known for its beautiful architecture, museums, and art'
Ground truth: None
====================================
====================================
Prompt: 'The future of AI is'
Generated text: ' in the hands of the people\nThe future of AI is in the hands of the people\nThe future of AI is in the hands of the people\nThe'
Ground truth: None
====================================
```

2. evaluate acc on deepseek-r1 with out of box block fp8 weight - limit
256

|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|

|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.9648|± |0.0115|
| | |strict-match | 5|exact_match|↑ |0.9648|± |0.0115|

3. evaluate acc on deepseek-r1 with out of box block fp8 weight + INC
calibrated per-channel scale

|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|

|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.9688|± |0.0109|
| | |strict-match | 5|exact_match|↑ |0.9688|± |0.0109|

---------

Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: kwisniewski98 <kwisniewski@habana.ai>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: kwisniewski98 <kwisniewski@habana.ai>
Co-authored-by: Youlei Yang <youlei.yang@intel.com>
Co-authored-by: Yi Liu <yi4.liu@intel.com>
Co-authored-by: Yi Liu <yiliu4@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants