add function for deep-ep tests#301
Merged
Yael-X merged 11 commits intosgl-project:mainfrom Jan 27, 2026
Merged
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
* upstream/main: fix little batchsize and int8 quant on ci (sgl-project#302) optimize sinks attention (sgl-project#260) add swiglu_oai_triton (sgl-project#270) update tag to 2026.01.12 (sgl-project#312) feat:add performance compare (sgl-project#311) support add_gemma_rms_norm (sgl-project#310) optimize gdn gating and fused_qkvzba_split_reshape_cat (sgl-project#306) fix layout numTokensPerExpertTensor partial Initialization bug (sgl-project#303) Supplement A2 doc, software and hardware compatibility info (sgl-project#294) Added an environment variable to control whether to enable the Combine Ant Migration feature. (sgl-project#304)
…p_percent, and modify the code to randomly generate the number of tokens based only on the num_tokens random ratio.
Yael-X
approved these changes
Jan 27, 2026
Yael-X
reviewed
Jan 27, 2026
| num_nodes = num_servers | ||
| expert_token_nums_type = int(os.getenv("MOE_EXPERT_TOKEN_NUMS_TYPE", 1)) | ||
|
|
||
| fluctuation_percentage = 0.1 |
Collaborator
There was a problem hiding this comment.
A switch needs to be added for dynamic BS testing.
zhuyutong332
added a commit
to zhuyutong332/sgl-kernel-npu
that referenced
this pull request
Jan 27, 2026
* upstream/main: add function for deep-ep tests (sgl-project#301) [Doc] Improved README.md content and English grammar and integrated the DeepWiki badge for Ask AI (sgl-project#345) (test) add solve_tril from upstream (sgl-project#339) Add AscendC triangular inverse (sgl-project#332) support the situation that topk maybe -1 on machine A3 (sgl-project#313) chunk_gated_delta_rule_npu output final state (sgl-project#341) The environment variable DEEPEP_HCCL_BUFFSIZE is added, and the priority of DEEPEP_HCCL_BUFFSIZE is higher than that of HCCL_BUFFSIZE. (sgl-project#329) Added the low_latency operator API documentation. (sgl-project#337) Added the verification of num_max_dispatch_tokens_per_rank to the decode operator adaptation layer. (sgl-project#330) Document get_dispatch_layout API (sgl-project#338) 【Doc】add fused deep moe doc (sgl-project#335) add deepep normal api doc (sgl-project#336) remove the limit that A2 internode only support topk 8 (sgl-project#323) Optimize the performance of the Combine Ant Moving function and the use of HCCL buffer (sgl-project#314) deepep adapt custom cann installation path (sgl-project#327) [Chore] CANN version bump to 8.5.0 (sgl-project#326) add dfx for operator FusedDeepMoe (sgl-project#317) Integrate ccache for faster compilation (sgl-project#318)
1329009851
added a commit
to 1329009851/sgl-kernel-npu
that referenced
this pull request
Feb 11, 2026
…-npu into sgl-cmake2 * 'sgl-cmake2' of https://github.com/1329009851/sgl-kernel-npu: CI execution requirements for separating a2 and a3 (sgl-project#367) Fix the bug that total expert num greater than 256 or local expert num is less than 8 (sgl-project#364) adapt ant moving to A2 single machine (sgl-project#362) reset ci -- run test mixed running for experts on a2. (sgl-project#365) Revert "Build the deepep package with the chip model included. (sgl-project#274)" (sgl-project#363) fix:buffer control (sgl-project#361) Build the deepep package with the chip model included. (sgl-project#274) bugfix wrong packages build dir (sgl-project#360) bump version to 2026.02.01 (sgl-project#359) Cover the workflows cases on a3 (sgl-project#321) release follows naming convention (sgl-project#356) Modify notifydispatch to support DEEPEP_NORMAL_LONG_SEQ_ROUND up to 128. (sgl-project#352) fix the hanging bug (sgl-project#355) [Bugfix] Fix build script working with cann 8.5.0 (sgl-project#354) Modify the description of DeepEP in the README file. (sgl-project#348) Revert "Add scripts for building CMake files (sgl-project#344)" (sgl-project#353) Add scripts for building CMake files (sgl-project#344) Support x86_64 and aarch64 binary release (sgl-project#325) add function for deep-ep tests (sgl-project#301) [Doc] Improved README.md content and English grammar and integrated the DeepWiki badge for Ask AI (sgl-project#345)
zzx-study
pushed a commit
to zzx-study/sgl-kernel-npu
that referenced
this pull request
Feb 28, 2026
* The test cases added to support different number of tokens processed by each rank. * cleancode * Add some details. * cleancode * [test]workflows * [test]workflows * [test]workflows * [test]workflows * [test]workflows * Remove the part where tokens of different ranks are controlled by drop_percent, and modify the code to randomly generate the number of tokens based only on the num_tokens random ratio.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The test cases added to support different number of tokens processed by each rank.The test cases added to support different number of tokens processed …