Dispatch Implementation for Expert Parallelism #9219

Blueblack319 · 2025-08-15T07:50:06Z

Blueblack319
Aug 15, 2025

I’m currently using SGLang v0.4.10 to run large MoE models such as Qwen3MoE and GLM-4.5 on a multi-GPU system.
After analyzing the executed kernels, I found that the first All2All (Dispatch) step is implemented as an AllGather operation.
Could you explain why SGLang implements Dispatch as AllGather instead of using All2All (i.e., all_to_all_single)?

MaoZiming · 2025-11-02T17:44:13Z

MaoZiming
Nov 2, 2025

1. I hope to know how the dispatch and combine phase under --moe-a2a-backend none is implemented.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dispatch Implementation for Expert Parallelism #9219

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Dispatch Implementation for Expert Parallelism #9219

Uh oh!

Blueblack319 Aug 15, 2025

Replies: 1 comment

Uh oh!

MaoZiming Nov 2, 2025

Blueblack319
Aug 15, 2025

MaoZiming
Nov 2, 2025