Skip to content

add function for deep-ep tests#301

Merged
Yael-X merged 11 commits intosgl-project:mainfrom
zhuyutong332:0106
Jan 27, 2026
Merged

add function for deep-ep tests#301
Yael-X merged 11 commits intosgl-project:mainfrom
zhuyutong332:0106

Conversation

@zhuyutong332
Copy link
Contributor

The test cases added to support different number of tokens processed by each rank.The test cases added to support different number of tokens processed …

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

* upstream/main:
  fix little batchsize and int8 quant on ci (sgl-project#302)
  optimize sinks attention (sgl-project#260)
  add swiglu_oai_triton (sgl-project#270)
  update tag to 2026.01.12 (sgl-project#312)
  feat:add performance compare (sgl-project#311)
  support add_gemma_rms_norm (sgl-project#310)
  optimize gdn gating and fused_qkvzba_split_reshape_cat (sgl-project#306)
  fix layout numTokensPerExpertTensor partial Initialization bug (sgl-project#303)
  Supplement A2 doc, software and hardware compatibility info (sgl-project#294)
  Added an environment variable to control whether to enable the Combine Ant Migration feature. (sgl-project#304)
…p_percent, and modify the code to randomly generate the number of tokens based only on the num_tokens random ratio.
@Yael-X Yael-X merged commit 66e5ba1 into sgl-project:main Jan 27, 2026
4 checks passed
num_nodes = num_servers
expert_token_nums_type = int(os.getenv("MOE_EXPERT_TOKEN_NUMS_TYPE", 1))

fluctuation_percentage = 0.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A switch needs to be added for dynamic BS testing.

zhuyutong332 added a commit to zhuyutong332/sgl-kernel-npu that referenced this pull request Jan 27, 2026
* upstream/main:
  add function for deep-ep tests (sgl-project#301)
  [Doc] Improved README.md content and English grammar and integrated the DeepWiki badge for Ask AI (sgl-project#345)
  (test) add solve_tril from upstream (sgl-project#339)
  Add AscendC triangular inverse (sgl-project#332)
  support the situation that topk maybe -1 on machine A3 (sgl-project#313)
  chunk_gated_delta_rule_npu output final state (sgl-project#341)
  The environment variable DEEPEP_HCCL_BUFFSIZE is added, and the priority of DEEPEP_HCCL_BUFFSIZE is higher than that of HCCL_BUFFSIZE. (sgl-project#329)
  Added the low_latency operator API documentation. (sgl-project#337)
  Added the verification of num_max_dispatch_tokens_per_rank to the decode operator adaptation layer. (sgl-project#330)
  Document get_dispatch_layout API (sgl-project#338)
  【Doc】add fused deep moe doc (sgl-project#335)
  add deepep normal api doc (sgl-project#336)
  remove the limit that A2 internode only support topk 8 (sgl-project#323)
  Optimize the performance of the Combine Ant Moving function and the use of HCCL buffer (sgl-project#314)
  deepep adapt custom cann installation path (sgl-project#327)
  [Chore] CANN version bump to 8.5.0 (sgl-project#326)
  add dfx for operator FusedDeepMoe (sgl-project#317)
  Integrate ccache for faster compilation (sgl-project#318)
@zhuyutong332 zhuyutong332 deleted the 0106 branch February 9, 2026 02:40
1329009851 added a commit to 1329009851/sgl-kernel-npu that referenced this pull request Feb 11, 2026
…-npu into sgl-cmake2

* 'sgl-cmake2' of https://github.com/1329009851/sgl-kernel-npu:
  CI execution requirements for separating a2 and a3 (sgl-project#367)
  Fix the bug that total expert num greater than 256 or local expert num is less than 8 (sgl-project#364)
  adapt ant moving to A2 single machine (sgl-project#362)
  reset ci -- run test mixed running for experts on a2. (sgl-project#365)
  Revert "Build the deepep package with the chip model included. (sgl-project#274)" (sgl-project#363)
  fix:buffer control (sgl-project#361)
  Build the deepep package with the chip model included. (sgl-project#274)
  bugfix wrong packages build dir (sgl-project#360)
  bump version to 2026.02.01 (sgl-project#359)
  Cover the workflows cases on a3 (sgl-project#321)
  release follows naming convention (sgl-project#356)
  Modify notifydispatch to support DEEPEP_NORMAL_LONG_SEQ_ROUND up to 128. (sgl-project#352)
  fix the hanging bug (sgl-project#355)
  [Bugfix] Fix build script working with cann 8.5.0 (sgl-project#354)
  Modify the description of DeepEP in the README file. (sgl-project#348)
  Revert "Add scripts for building CMake files (sgl-project#344)" (sgl-project#353)
  Add scripts for building CMake files (sgl-project#344)
  Support x86_64 and aarch64 binary release (sgl-project#325)
  add function for deep-ep tests (sgl-project#301)
  [Doc] Improved README.md content and English grammar and integrated the DeepWiki badge for Ask AI (sgl-project#345)
zzx-study pushed a commit to zzx-study/sgl-kernel-npu that referenced this pull request Feb 28, 2026
* The test cases added to support different number of tokens processed by each rank.

* cleancode

* Add some details.

* cleancode

* [test]workflows

* [test]workflows

* [test]workflows

* [test]workflows

* [test]workflows

* Remove the part where tokens of different ranks are controlled by drop_percent, and modify the code to randomly generate the number of tokens based only on the num_tokens random ratio.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants