Skip to content

Optimize all_reduce by porting the shared memory kernel of deepspeed#6

Merged
chunyuan-w merged 3 commits intomingfeima:cpu_opt_ww08from
chunyuan-w:chunyuan/pr_shm_allreduce_ww08
Feb 20, 2025
Merged

Optimize all_reduce by porting the shared memory kernel of deepspeed#6
chunyuan-w merged 3 commits intomingfeima:cpu_opt_ww08from
chunyuan-w:chunyuan/pr_shm_allreduce_ww08

Conversation

@chunyuan-w
Copy link
Collaborator

Cherry-pick #5 to the cpu_opt_ww08 branch together with changes needed to work with the latest sglang main branch.

@chunyuan-w chunyuan-w marked this pull request as ready for review February 20, 2025 05:33
@chunyuan-w chunyuan-w merged commit c3fc97d into mingfeima:cpu_opt_ww08 Feb 20, 2025
1 check passed
chunyuan-w added a commit that referenced this pull request Mar 14, 2025
#6)

* Optimize all_reduce by porting the shm kernel of deepspeed

* Fix rebase: use get_tp_group in sglang.srt.distributed

* Fix rebase: directly modify tensor_model_parallel_all_reduce in sglang
jianan-gu referenced this pull request in jianan-gu/sglang Apr 7, 2025
#6)

* Optimize all_reduce by porting the shm kernel of deepspeed

* Fix rebase: use get_tp_group in sglang.srt.distributed

* Fix rebase: directly modify tensor_model_parallel_all_reduce in sglang
mingfeima pushed a commit that referenced this pull request Apr 8, 2025
#6)

* Optimize all_reduce by porting the shm kernel of deepspeed

* Fix rebase: use get_tp_group in sglang.srt.distributed

* Fix rebase: directly modify tensor_model_parallel_all_reduce in sglang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant