Skip to content

Comments

[fix] Only enable flashinfer all reduce fusion by default for single-node servers#12724

Merged
Fridge003 merged 1 commit intosgl-project:mainfrom
leejnau:workaround-flashinfer-allreduce-fusion-bug
Nov 6, 2025
Merged

[fix] Only enable flashinfer all reduce fusion by default for single-node servers#12724
Fridge003 merged 1 commit intosgl-project:mainfrom
leejnau:workaround-flashinfer-allreduce-fusion-bug

Conversation

@leejnau
Copy link
Collaborator

@leejnau leejnau commented Nov 6, 2025

Motivation

Currently multi-node non-data-parallel inference does not work for DeepseekV3ForCausalLM models.
This is due to a bug in flashinfer: flashinfer-ai/flashinfer#2006

Modifications

Currently enable_flashinfer_allreduce_fusion is enabled by default for DeepseekV3ForCausalLM and GptOssForCausalLM. Because of the flashinfer all reduce fusion bug, a workaround is to only enable flashinfer all reduce fusion if a single node is used.

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@leejnau leejnau changed the title Only enable flashinfer all reduce fusion by default for single-node servers [fix] Only enable flashinfer all reduce fusion by default for single-node servers Nov 6, 2025
@leejnau leejnau force-pushed the workaround-flashinfer-allreduce-fusion-bug branch from 2c40110 to 8069e8f Compare November 6, 2025 15:26
@Fridge003 Fridge003 merged commit b0d1c21 into sgl-project:main Nov 6, 2025
63 of 68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants