[NPU] feat: Support FSDP worker and vLLM Ascend by sunyi0505 · Pull Request #332 · verl-project/verl

sunyi0505 · 2025-02-21T03:10:04Z

For developers, you can follow the docs: docs/ascend/ascend.rst

This pr is committed for supporting Ascend NPU backend.
Co-authored-by: Chendong98 chendong136@huawei.com
Co-authored-by: zheliuyu 15750543867@163.com
Co-authored-by: celestialli celestialli@outlook.com
In this pr, we add the capability to determine the type of NPU device and we also add a new script for training on NPU.

These are change lists:

pyproject.toml change verison of vllm
requirements-npu.txt requirements for NPU
verl/bert_padding.py Adapted from https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py
verl/single_controller/ray/base.py
verl/third_party/vllm/vllm_spmd/dtensor_weight_loaders.py
verl/trainer/fsdp_sft_trainer.py
verl/utils/flops_counter.py
verl/utils/fsdp_utils.py
verl/workers/actor/dp_actor.py
verl/workers/critic/dp_critic.py
verl/workers/fsdp_workers.py
verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py
verl/workers/sharding_manager/fsdp_vllm.py
verl/utils/device.py get device type for different device
docs/ascend/ascend.md

Here are our roadmap:

RoadMap

sft
ppo
grpo

News

[2025.03.31] Add result of SFT and GRPO. Qwen2-7B-Instruct was tested on 2*8 devices, and many params related to batch_size need to be reduced. So this result is only for reference. We will announce the reward results of the default params as soon as sleep mode is supported.

[2025.03.03] Modify the adaptation method of Ray

[2025.02.25] The PPO algorithm is supported for training on NPU with the FSDP backend.

[2025.02.23] The SFT algorithm is supported for training on NPU with the FSDP backend.

[2025.02.21] The GRPO algorithm is supported for training on NPU with the FSDP backend.

Requirements
We use this PR testing on Ascend NPU and GPU to ensure the same codes can run on different devices. The device information is 8 Atlas 800T A2 and 8 A100. Other software information is shown in the following table.

Software	Version
transformers	4.47.1
accelerate	1.3.0
torch_npu	2.5.1.rc1
CANN	8.1.RC1 (Not Released)

About mean error
Due to differences in hardware structure, we cannot guarantee that the loss of Ascend NPU is exactly the same as that of the GPU. According to our experience, the loss differences less than 2% is acceptable. If the loss difference is greater than 2%, we will try to fix it. The calculation formula is as follows.

N represents the number of training steps. For more information, please refer to Calculation accuracy description

huangk10 · 2025-02-21T06:33:08Z

does this pr work on multi nodes?

sunyi0505 · 2025-02-21T07:00:13Z

does this pr work on multi nodes?

I am currently conducting tests on a single node only, and will subsequently supplement with multi-node testing results.

docs/ascend/ascend.md

pyproject.toml

requirements-npu.txt

verl/utils/fsdp_utils.py

verl/workers/fsdp_workers.py

examples/grpo_trainer/run_qwen2-7b_npu.sh

CLAassistant · 2025-02-26T00:32:21Z

All committers have signed the CLA.

For developers, you can follow the docs: docs/ascend/ascend.rst This pr is committed for supporting Ascend NPU backend. Co-authored-by: Chendong98 [chendong136@huawei.com](mailto:chendong136@huawei.com) Co-authored-by: zheliuyu <15750543867@163.com> Co-authored-by: celestialli [celestialli@outlook.com](mailto:celestialli@outlook.com) In this pr, we add the capability to determine the type of NPU device and we also add a new script for training on NPU. These are change lists: 1. pyproject.toml change verison of vllm 2. requirements-npu.txt requirements for NPU 3. verl/bert_padding.py Adapted from https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py 4. verl/single_controller/ray/base.py 5. verl/third_party/vllm/vllm_spmd/dtensor_weight_loaders.py 6. verl/trainer/fsdp_sft_trainer.py 7. verl/utils/flops_counter.py 8. verl/utils/fsdp_utils.py 9. verl/workers/actor/dp_actor.py 10. verl/workers/critic/dp_critic.py 11. verl/workers/fsdp_workers.py 12. verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py 13. verl/workers/sharding_manager/fsdp_vllm.py 14. verl/utils/device.py get device type for different device 15. docs/ascend/ascend.md Here are our roadmap: **RoadMap** - [x] sft - [x] ppo - [x] grpo News [2025.03.31] Add result of SFT and GRPO. Qwen2-7B-Instruct was tested on 2*8 devices, and many params related to batch_size need to be reduced. So this result is only for reference. We will announce the reward results of the default params as soon as sleep mode is supported. [2025.03.03] Modify the adaptation method of Ray [2025.02.25] The PPO algorithm is supported for training on NPU with the FSDP backend. [2025.02.23] The SFT algorithm is supported for training on NPU with the FSDP backend. [2025.02.21] The GRPO algorithm is supported for training on NPU with the FSDP backend. Requirements We use this PR testing on Ascend NPU and GPU to ensure the same codes can run on different devices. The device information is 8 Atlas 800T A2 and 8 A100. Other software information is shown in the following table. | Software | Version | |:-------|-------:| | transformers | 4.47.1 | | accelerate | 1.3.0 | | torch_npu | 2.5.1.rc1| |CANN | 8.1.RC1 (Not Released)| About mean error Due to differences in hardware structure, we cannot guarantee that the loss of Ascend NPU is exactly the same as that of the GPU. According to our experience, the loss differences less than 2% is acceptable. If the loss difference is greater than 2%, we will try to fix it. The calculation formula is as follows. ![loss_comparison](https://github.com/user-attachments/assets/4f62f713-9240-4324-bf7d-3ae59fc85b05) N represents the number of training steps. For more information, please refer to [Calculation accuracy description](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/LMaccuracy_0001.html) --------- Co-authored-by: Chendong98 <chendong136@huawei.com> Co-authored-by: zheliuyu <15750543867@163.com>

sunyi0505 changed the title ~~support ASCEND NPU~~ [WIP] support ASCEND NPU Feb 21, 2025

sunyi0505 force-pushed the vllm-0.7-npu branch from 7510441 to 8e1637e Compare February 21, 2025 06:23

Yikun mentioned this pull request Feb 21, 2025

Add Ascend NPU support for verl #338

Closed

sunyi0505 force-pushed the vllm-0.7-npu branch 2 times, most recently from 0afd136 to d496b70 Compare February 21, 2025 07:59

sunyi0505 changed the title ~~[WIP] support ASCEND NPU~~ Support FSDP worker and vLLM Ascend Feb 21, 2025

sunyi0505 force-pushed the vllm-0.7-npu branch 10 times, most recently from 8b1b207 to 0b7e274 Compare February 22, 2025 06:48

antonlisq reviewed Feb 22, 2025

View reviewed changes

docs/ascend/ascend.md Outdated Show resolved Hide resolved

docs/ascend/ascend.md Outdated Show resolved Hide resolved

FightingZhen reviewed Feb 22, 2025

View reviewed changes

sunyi0505 force-pushed the vllm-0.7-npu branch 2 times, most recently from 62af61c to fd62e2e Compare February 24, 2025 01:27

sunyi0505 commented Feb 24, 2025

View reviewed changes

examples/grpo_trainer/run_qwen2-7b_npu.sh Show resolved Hide resolved

sunyi0505 force-pushed the vllm-0.7-npu branch 3 times, most recently from 45f208b to d36c1c7 Compare February 25, 2025 08:07

sunyi0505 force-pushed the vllm-0.7-npu branch 3 times, most recently from 6314fcf to d4309a8 Compare March 3, 2025 07:21

sunyi0505 force-pushed the vllm-0.7-npu branch 18 times, most recently from a48c6d2 to 74b520f Compare May 23, 2025 01:58

modify CI file and remove device_name param

ed22d71

sunyi0505 force-pushed the vllm-0.7-npu branch from 74b520f to ed22d71 Compare May 23, 2025 02:06

sunyi0505 added 3 commits May 23, 2025 11:12

modify default device_name to cuda

1c071c6

modify default device_name to cuda

e94006a

modify default device_name to cuda

b58808d

vermouth1992 approved these changes May 23, 2025

View reviewed changes

vermouth1992 merged commit 0528ba1 into verl-project:main May 23, 2025
37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[NPU] feat: Support FSDP worker and vLLM Ascend#332

[NPU] feat: Support FSDP worker and vLLM Ascend#332
vermouth1992 merged 23 commits intoverl-project:mainfrom
sunyi0505:vllm-0.7-npu

sunyi0505 commented Feb 21, 2025 •

edited

Loading

Uh oh!

huangk10 commented Feb 21, 2025

Uh oh!

sunyi0505 commented Feb 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Feb 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Comments

Conversation

sunyi0505 commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huangk10 commented Feb 21, 2025

Uh oh!

sunyi0505 commented Feb 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

sunyi0505 commented Feb 21, 2025 •

edited

Loading

CLAassistant commented Feb 26, 2025 •

edited

Loading