Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions python/sglang/srt/models/deepseek_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -1932,6 +1932,8 @@ def post_load_weights(self, is_nextn=False, weight_names=None):
self._weight_requant_ue8m0()

def _weight_requant_ue8m0(self):
if self.config.architectures[0] == "DeepseekV3ForCausalLMNextN":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Potential IndexError: Accessing self.config.architectures[0] without checking if self.config.architectures is empty could lead to a runtime error if the architectures list is unexpectedly empty (e.g., due to a malformed config). It's safer to check for non-emptiness first.

Suggested change
if self.config.architectures[0] == "DeepseekV3ForCausalLMNextN":
if self.config.architectures and self.config.architectures[0] == "DeepseekV3ForCausalLMNextN":

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Maintainability: Consider adding a brief comment explaining why DeepseekV3ForCausalLMNextN skips this _weight_requant_ue8m0 step. This would help future maintainers understand the rationale (e.g., if this architecture doesn't use ue8m0, or if this requantization is incompatible/unnecessary for it).

Suggested change
if self.config.architectures[0] == "DeepseekV3ForCausalLMNextN":
# This specific architecture variant (DeepseekV3ForCausalLMNextN) might handle
# ue8m0 quantization differently or not require this explicit requantization step.
# TODO: Add a more precise reason if known (e.g., "Skips because NextN models do not use ue8m0 weights").
if self.config.architectures[0] == "DeepseekV3ForCausalLMNextN":

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe the MTP layer also have weights that needs to be requantized

thus I guess maybe change for layer_id in ... in the _weight_requant_ue8m0 into something that allows to handle mtp layer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Do you mean this will affect the accept length?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes my naive guess is that, it will make nextn layer to have wrong output. But have not do any experiments and I can be totally wrong

return
weight_block_size = self.quant_config.weight_block_size

moe_layers = list(
Expand Down
Loading