call check_quantized_moe_compatibility after initialize by chunyuan-w · Pull Request #13876 · sgl-project/sglang

chunyuan-w · 2025-11-25T02:46:52Z

Motivation

Fixes the error when running DeepSeek-V3.1-Terminus-FP8 with TP=6 on CPU.

  File "/sglang/srt/model_executor/model_runner.py", line 306, in __init__
    self.check_quantized_moe_compatibility()
  File "/sglang/srt/model_executor/model_runner.py", line 597, in check_quantized_moe_compatibility
    raise ValueError(
ValueError: moe_intermediate_size 2048 must be divisible by moe_tp_size (6) which is tp_size (6) divided by moe_ep_size (1).

Modifications

Move check_quantized_moe_compatibility() to be after self.initialize(min_per_gpu_memory).

On CPU, we will do padding for moe_intermediate_size if it can't be divided by the tp_size inside self.initialize(min_per_gpu_memory):

sglang/python/sglang/srt/model_executor/model_runner.py

Lines 775 to 778 in de430b6

    
           if self.device == "cpu": 
        
               self.model_config = adjust_config_with_unaligned_cpu_tp( 
        
                   self.model_config, self.load_config, self.tp_size 
        
               )

We need to call check_quantized_moe_compatibility() after this padding otherwise we will run into the above error.
We're not able to move the adjust_config_with_unaligned_cpu_tp() call to an earlier place because it requires self.load_config to be set first and this is done inside the self.initialize(min_per_gpu_memory) here:

sglang/python/sglang/srt/model_executor/model_runner.py

Lines 765 to 774 in de430b6

    
           self.load_config = LoadConfig( 
        
               load_format=self.server_args.load_format, 
        
               download_dir=self.server_args.download_dir, 
        
               model_loader_extra_config=self.server_args.model_loader_extra_config, 
        
               tp_rank=self.tp_rank, 
        
               remote_instance_weight_loader_seed_instance_ip=self.server_args.remote_instance_weight_loader_seed_instance_ip, 
        
               remote_instance_weight_loader_seed_instance_service_port=self.server_args.remote_instance_weight_loader_seed_instance_service_port, 
        
               remote_instance_weight_loader_send_weights_group_ports=self.server_args.remote_instance_weight_loader_send_weights_group_ports, 
        
               modelopt_config=modelopt_config, 
        
           )

chunyuan-w · 2025-11-26T11:38:59Z

/rerun-failed-ci

chunyuan-w · 2025-11-27T01:23:00Z

@zhyncs @Alcanderian could you please help review this PR? The CI failures are unrelated.

Fridge003 · 2025-12-05T19:14:37Z

maybe cc @JustinTong0323

chunyuan-w · 2025-12-09T02:16:17Z

Hi @JustinTong0323 could you please take a look at this PR?

chunyuan-w · 2025-12-10T05:27:30Z

Hi @Alcanderian I checked that the CI failures are unrelated to this PR. Could you please help land this PR?

…n_eagle3_npu * 'main' of https://github.com/sgl-project/sglang: (25 commits) [NPU] perf update with kvcache nz & w4a8 quant (sgl-project#14423) [PP Prefill][NIXL] Fix PP mode transfer completion tracking to wait for all ranks (sgl-project#15027) Fix GLM-4.6 tool calls don't support streaming output for arguments i… (sgl-project#13989) feature: adding nightly wheel workflow and indexer (sgl-project#14924) [diffusion] feat: Improve LoRA compatibility by adding unified format detection and diffusers-based normalization (sgl-project#14659) [Fix] Disable trtllm moe backend for draft model for a qucik fix (sgl-project#15002) [diffusion] fix: use NDRotaryEmbedding in flux_2 (sgl-project#15034) Mistral Large 3 NVFP4 support (sgl-project#14485) call check_quantized_moe_compatibility after initialize (sgl-project#13876) Add sgl_router_attempt_http_responses_total for single attempt information (sgl-project#15037) Add error code in prometheus metrics and add X-SMG-Error-Code header (sgl-project#15036) Provide more fine grained error reason for reqwest error (sgl-project#15032) Tiny change http router response format to unify (sgl-project#15031) Tiny unify grpc existing error responses into new format (sgl-project#15030) Add `code` field and unify error responses for router (sgl-project#15028) Super tiny remove unused log_request (sgl-project#15035) Fix decode OOM caused by retraction (sgl-project#14939) [CI]Add gb200 runner back (sgl-project#15024) Add a special label for b200 CI runner that can run kernel tests (sgl-project#15033) Fix regression caused by fa3 block_table (sgl-project#15009) ... # Conflicts: # python/sglang/srt/hardware_backend/npu/attention/ascend_backend.py

…13876)

mingfeima approved these changes Nov 25, 2025

View reviewed changes

mingfeima added the run-ci label Nov 25, 2025

chunyuan-w marked this pull request as ready for review November 25, 2025 05:03

chunyuan-w requested review from Fridge003, Ying1123, hnyls2002, ispobock and merrymercy as code owners November 25, 2025 05:03

chunyuan-w added 5 commits November 25, 2025 16:07

Merge branch 'main' into chunyuan/fix_fp8_check

82fb442

call check_quantized_moe_compatibility after initialize

d40417e

Merge branch 'main' into chunyuan/fix_fp8_check

564c953

Merge branch 'main' into chunyuan/fix_fp8_check

4c67417

Merge branch 'main' into chunyuan/fix_fp8_check

3edd1bc

Merge branch 'main' into chunyuan/fix_fp8_check

6cc519f

Merge branch 'main' into chunyuan/fix_fp8_check

bde10bf

Alcanderian approved these changes Dec 9, 2025

View reviewed changes

chunyuan-w added 2 commits December 10, 2025 08:31

Merge branch 'main' into chunyuan/fix_fp8_check

26272d8

Merge branch 'main' into chunyuan/fix_fp8_check

7c98612

ispobock added 2 commits December 12, 2025 15:52

Merge branch 'main' into chunyuan/fix_fp8_check

126e9b6

Merge branch 'main' into chunyuan/fix_fp8_check

9c19149

ispobock merged commit 2a39cfe into sgl-project:main Dec 13, 2025
20 of 33 checks passed

Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 17, 2025

call check_quantized_moe_compatibility after initialize (sgl-project#…

41440d8

…13876)

GuoYechang pushed a commit to GuoYechang/sglang that referenced this pull request Jan 13, 2026

call check_quantized_moe_compatibility after initialize (sgl-project#…

70c8c68

…13876)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

call check_quantized_moe_compatibility after initialize#13876

call check_quantized_moe_compatibility after initialize#13876
ispobock merged 11 commits intosgl-project:mainfrom
chunyuan-w:chunyuan/fix_fp8_check

chunyuan-w commented Nov 25, 2025

Uh oh!

chunyuan-w commented Nov 26, 2025

Uh oh!

chunyuan-w commented Nov 27, 2025

Uh oh!

Fridge003 commented Dec 5, 2025

Uh oh!

chunyuan-w commented Dec 9, 2025

Uh oh!

chunyuan-w commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

	if self.device == "cpu":
	self.model_config = adjust_config_with_unaligned_cpu_tp(
	self.model_config, self.load_config, self.tp_size
	)

	self.load_config = LoadConfig(
	load_format=self.server_args.load_format,
	download_dir=self.server_args.download_dir,
	model_loader_extra_config=self.server_args.model_loader_extra_config,
	tp_rank=self.tp_rank,
	remote_instance_weight_loader_seed_instance_ip=self.server_args.remote_instance_weight_loader_seed_instance_ip,
	remote_instance_weight_loader_seed_instance_service_port=self.server_args.remote_instance_weight_loader_seed_instance_service_port,
	remote_instance_weight_loader_send_weights_group_ports=self.server_args.remote_instance_weight_loader_send_weights_group_ports,
	modelopt_config=modelopt_config,
	)

Conversation

chunyuan-w commented Nov 25, 2025

Motivation

Modifications

Uh oh!

chunyuan-w commented Nov 26, 2025

Uh oh!

chunyuan-w commented Nov 27, 2025

Uh oh!

Fridge003 commented Dec 5, 2025

Uh oh!

chunyuan-w commented Dec 9, 2025

Uh oh!

chunyuan-w commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments