Add scripts for building CMake files#372
Add scripts for building CMake files#3721329009851 wants to merge 27 commits intosgl-project:mainfrom
Conversation
…-npu into sgl-cmake2 * 'sgl-cmake2' of https://github.com/1329009851/sgl-kernel-npu: changes from pre-commit
…-npu into sgl-cmake2 * 'sgl-cmake2' of https://github.com/1329009851/sgl-kernel-npu: Fix a script call bug
Summary of ChangesHello @1329009851, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the build system for the 'deepep' module by introducing a new CMake-based approach. It centralizes the build logic, adds support for custom operators, and improves compatibility with different CANN versions. The changes also involve a substantial cleanup of outdated or redundant build utility scripts, leading to a more streamlined and maintainable build process. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces scripts and configuration for building CMake files for the deepep module. The changes include new build scripts, CMake configuration updates, and the removal of old build utility scripts.
My review has identified a few areas for improvement:
- A critical issue with potential infinite recursion in one of the new build scripts.
- Some unused variables in another script that affect readability.
- Code duplication in CMake files which could be refactored for better maintainability.
Overall, the move towards a more standard CMake-based build system is a positive change. Addressing the identified issues will make the new build process more robust and easier to maintain.
…-npu into sgl-cmake2 * 'sgl-cmake2' of https://github.com/1329009851/sgl-kernel-npu: CI execution requirements for separating a2 and a3 (sgl-project#367) Fix the bug that total expert num greater than 256 or local expert num is less than 8 (sgl-project#364) adapt ant moving to A2 single machine (sgl-project#362) reset ci -- run test mixed running for experts on a2. (sgl-project#365) Revert "Build the deepep package with the chip model included. (sgl-project#274)" (sgl-project#363) fix:buffer control (sgl-project#361) Build the deepep package with the chip model included. (sgl-project#274) bugfix wrong packages build dir (sgl-project#360) bump version to 2026.02.01 (sgl-project#359) Cover the workflows cases on a3 (sgl-project#321) release follows naming convention (sgl-project#356) Modify notifydispatch to support DEEPEP_NORMAL_LONG_SEQ_ROUND up to 128. (sgl-project#352) fix the hanging bug (sgl-project#355) [Bugfix] Fix build script working with cann 8.5.0 (sgl-project#354) Modify the description of DeepEP in the README file. (sgl-project#348) Revert "Add scripts for building CMake files (sgl-project#344)" (sgl-project#353) Add scripts for building CMake files (sgl-project#344) Support x86_64 and aarch64 binary release (sgl-project#325) add function for deep-ep tests (sgl-project#301) [Doc] Improved README.md content and English grammar and integrated the DeepWiki badge for Ask AI (sgl-project#345)
|
The file changes displayed by git status before and after the build of cann8.2, 8.3, and 8.5 are as follows: after Untracked files: After the build, the moe_distribute_base.h file is updated. The cause of this issue will be located later. This modification has been verified and passed on the A2, A3, and A5 environments. |
…-npu into sgl-cmake2 * 'sgl-cmake2' of https://github.com/1329009851/sgl-kernel-npu: Fix the bug that the layout kernel crashed when the num of experts is no less than 384 (sgl-project#383) adapt sglang (sgl-project#357) GLM5 optimize (sgl-project#382) Update layernorm_gated.py (sgl-project#378) support qwen3.5 (sgl-project#377)
|
Description of build command parameters (for compiling DeepEP): |
e04779c to
8c04849
Compare
…-npu into sgl-cmake2 * 'sgl-cmake2' of https://github.com/1329009851/sgl-kernel-npu: Modification for review comments
…-npu into sgl-cmake2 * 'sgl-cmake2' of https://github.com/1329009851/sgl-kernel-npu: Modification for review comments
…-npu into sgl-cmake2 * 'sgl-cmake2' of https://github.com/1329009851/sgl-kernel-npu:
There was a problem hiding this comment.
整体方向很好,大幅简化了构建系统。提几个建议供参考:
1. CANN 版本检测代码重复
ops/CMakeLists.txt 和 ops2/CMakeLists.txt 中有 30+ 行完全相同的 CANN 版本检测逻辑:
set(CANN82_PATH "${ASCEND_CANN_PACKAGE_PATH}/include/experiment/platform/platform/platform_infos_def.h")
set(CANN83_PATH "${ASCEND_CANN_PACKAGE_PATH}/include/platform/platform_infos_def.h")
# ... 30+ 行建议: 提取为 cmake/cann_version.cmake 并在两个文件中 include(),避免未来维护时出现不一致。
2. build.sh 错误处理增强
csrc/deepep_cmake_build.sh 中调用子脚本时缺少错误检查:
echo "./deepep/build.sh $@"
./deepep/build.sh $@
# 建议增加:
if [ $? -ne 0 ]; then
echo "ERROR: deepep build failed"
exit 1
fi3. 日志输出统一性
build.sh 中 echo "Use SOC_VERSION: $SOC_VERSION" 很好,建议在 compile_ascend_proj.sh 的 BuildAscendProj 函数开头也增加类似日志,方便调试时追踪实际使用的芯片型号。
4. A2/A3/A5 芯片映射注释
建议在 build.sh:84-88 附近增加注释,明确说明:
deepep→ A3+ (Ascend910_9382)deepep2→ A2 (Ascend910B1)- A5 暂不开源
这样后续维护者更容易理解默认值选择的逻辑。
以上都是锦上添花的建议,不影响整体合入。👍
| message(FATAL_ERROR "Unsupported host processor: ${CMAKE_SYSTEM_PROCESSOR}. Please specify a valid architecture.") | ||
| endif() | ||
|
|
||
| set(CANN82_PATH "${ASCEND_CANN_PACKAGE_PATH}/include/experiment/platform/platform/platform_infos_def.h") |
There was a problem hiding this comment.
Consider extracting this CANN version detection logic into a shared cmake/cann_version.cmake file and including it from both ops/CMakeLists.txt and ops2/CMakeLists.txt. This avoids code duplication (30+ identical lines) and ensures consistency when updating CANN version support in the future.
| export BUILD_TYPE="Release" | ||
| MODULE_NAME="all" | ||
| MODULE_BUILD_ARG="" | ||
| IS_MODULE_EXIST=0 |
There was a problem hiding this comment.
Consider adding error handling after calling ./deepep/build.sh. If the build fails, the script should exit with an error code:
./deepep/build.sh $@
if [ $? -ne 0 ]; then
echo "ERROR: deepep build failed"
exit 1
fi| local soc_version=$2 | ||
|
|
||
| if [ -d "./${proj_name}" ]; then | ||
| rm -rf ${proj_name}/cmake |
There was a problem hiding this comment.
Consider adding a log statement at the beginning of this function to show which SOC version is being used, similar to the echo "Use SOC_VERSION: $SOC_VERSION" in build.sh. This helps with debugging:
BuildAscendProj() {
local soc_version=$2
echo "[${FUNCNAME[0]}] Building for SOC: $soc_version"
# ... rest of the function| export DEBUG_MODE=$DEBUG_MODE | ||
|
|
||
| SOC_VERSION="${1:-Ascend910_9382}" | ||
| if [[ "$BUILD_DEEPEP_OPS" == "ON" ]]; then |
There was a problem hiding this comment.
Consider adding a comment here to clarify the chip mapping logic for future maintainers:
# Chip mapping:
# - deepep → A3+ (Ascend910_9382)
# - deepep2 → A2 (Ascend910B1)
# - A5 is not open-sourced yetThis makes it clearer why different default SOC_VERSION values are used based on the BUILD_DEEPEP_OPS flag.
Add scripts for building CMake files