Dispatch Implementation for Expert Parallelism #9219
Unanswered
Blueblack319
asked this question in
Q&A
Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I’m currently using SGLang v0.4.10 to run large MoE models such as Qwen3MoE and GLM-4.5 on a multi-GPU system.
After analyzing the executed kernels, I found that the first All2All (Dispatch) step is implemented as an AllGather operation.
Could you explain why SGLang implements Dispatch as AllGather instead of using All2All (i.e.,
all_to_all_single)?Beta Was this translation helpful? Give feedback.
All reactions