Optimize `all_reduce` by porting the shared memory kernel of deepspeed by chunyuan-w · Pull Request #6 · mingfeima/sglang

chunyuan-w · 2025-02-20T05:32:14Z

Cherry-pick #5 to the cpu_opt_ww08 branch together with changes needed to work with the latest sglang main branch.

#6) * Optimize all_reduce by porting the shm kernel of deepspeed * Fix rebase: use get_tp_group in sglang.srt.distributed * Fix rebase: directly modify tensor_model_parallel_all_reduce in sglang

chunyuan-w added 3 commits February 19, 2025 11:09

Optimize all_reduce by porting the shm kernel of deepspeed

464e98b

Fix rebase: use get_tp_group in sglang.srt.distributed

1822f2b

Fix rebase: directly modify tensor_model_parallel_all_reduce in sglang

8cecf4f

chunyuan-w marked this pull request as ready for review February 20, 2025 05:33

chunyuan-w merged commit c3fc97d into mingfeima:cpu_opt_ww08 Feb 20, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `all_reduce` by porting the shared memory kernel of deepspeed#6

Optimize `all_reduce` by porting the shared memory kernel of deepspeed#6
chunyuan-w merged 3 commits intomingfeima:cpu_opt_ww08from
chunyuan-w:chunyuan/pr_shm_allreduce_ww08

chunyuan-w commented Feb 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chunyuan-w commented Feb 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant