Skip to content

[feat] add ascend readme and docker release#8700

Merged
zhyncs merged 12 commits intosgl-project:mainfrom
pkking:main
Aug 12, 2025
Merged

[feat] add ascend readme and docker release#8700
zhyncs merged 12 commits intosgl-project:mainfrom
pkking:main

Conversation

@pkking
Copy link
Contributor

@pkking pkking commented Aug 2, 2025

Motivation

In the past, we only had images for GPU and AMD, but this PR would try to build and push docker image for NPU hardware

Modifications

Add two new workflow and NPU related Dockerfile, both docker images will be published to offical registry:

  1. daily dev image for user to try and nightly test case, named as sglang:main-cann8.2.rc1.alpha003-a3
  2. release image when new tag added, named as sglang:v0.4.10.post1-cann8.2.rc1.alpha003-a3

NOTE: we use metadata-action to generate image tags and labels, so the workflow rely on push tag event

FOR END USER

the image name schema: sglang-{ver|main}-{cann_ver}-{device_type}, the os and python inherit from cann image, default to the latest ubuntu and python version

FOR DEVELOPER

The released image will use the specific sgl-kernel-npu tag, if a new kernel-npu tag is needed, please edit the release docker workflow

Accuracy Test

NO NEED

Benchmark & Profiling

NO NEED

Checklist

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @pkking, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've initiated work on adding Docker release capabilities, specifically targeting NPU hardware. This pull request introduces a new Dockerfile that outlines the complete environment setup required to build and run applications on NPU devices, including the installation of various deep learning frameworks and custom kernels. The goal is to streamline the deployment process for NPU-accelerated workloads.

Highlights

  • New Dockerfile for NPU Support: I've introduced a new Dockerfile (docker/Dockerfile.npu) to enable the building and pushing of Docker images tailored for NPU (Neural Processing Unit) hardware. This is a foundational step towards supporting NPU-specific deployments.
  • Comprehensive NPU Environment Setup: The new Dockerfile sets up a comprehensive environment, including the installation of essential development tools, PyTorch with NPU adapters, vLLM, Triton-Ascend, and SGLang. It also integrates a custom SGLang kernel for NPU, ensuring all necessary dependencies are pre-configured within the image.
  • Integration of Custom NPU Kernel: The Dockerfile includes specific steps to clone and build sgl-kernel-npu and install deep-ep, which are crucial for leveraging NPU capabilities with SGLang. This ensures that the custom kernel is correctly compiled and linked within the Docker environment.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@pkking pkking marked this pull request as draft August 2, 2025 07:57
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Dockerfile for NPU hardware. The Dockerfile is functional but has several areas for improvement. My review focuses on a critical security issue regarding hardcoded credentials in URLs, and several medium-severity issues related to Docker best practices for optimizing image size and build efficiency. Specifically, I've suggested removing sensitive credentials, combining multiple RUN instructions for apt and pip commands, and cleaning up cloned git repositories after use. These changes will result in a more secure and leaner Docker image.

@pkking pkking marked this pull request as ready for review August 5, 2025 11:27
@pkking pkking changed the title [WIP]feat: add docker release [NPU]feat: add docker release Aug 6, 2025
@ping1jing2 ping1jing2 changed the title [NPU]feat: add docker release [feat] add ascend docker release Aug 6, 2025
@pkking
Copy link
Contributor Author

pkking commented Aug 9, 2025

LGTM

pkking and others added 3 commits August 11, 2025 16:38
@pkking pkking force-pushed the main branch 3 times, most recently from d2b4b13 to d91192e Compare August 11, 2025 09:09
Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>
@ping1jing2 ping1jing2 changed the title [feat] add ascend docker release [feat] add ascend readme and docker release Aug 11, 2025
@ping1jing2
Copy link
Collaborator

lgtm

@thincal
Copy link

thincal commented Aug 12, 2025

@iforgetmyname I am using 8 * Ascend 910B to deploy the GLM4.5-Air (106B) model, but it reports OOM, could you help have a check? thanks.

  • repro steps
# step1: launch docker
docker run -it --rm --shm-size 512g \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver  \
    -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons  \
    -v /usr/local/sbin/:/usr/local/sbin \
    -v /lib/modules:/lib/modules  \
    -v /data/models:/data/models \
    --device=/dev/davinci0 \
    --device=/dev/davinci1 \
    --device=/dev/davinci2 \
    --device=/dev/davinci3 \
    --device=/dev/davinci4 \
    --device=/dev/davinci5 \
    --device=/dev/davinci6 \
    --device=/dev/davinci7 \
    --device=/dev/davinci_manager \
    --device=/dev/devmm_svm \
    --device=/dev/hisi_hdc \
    --privileged=true \
    sglang-ascend:latest bash

# step2: prepare env
source /usr/local/Ascend/driver/bin/setenv.bash
source /usr/local/Ascend/ascend-toolkit/set_env.sh
pip3 install -U transformers==4.53.3

# step3: launch sglang server
python3 -m sglang.launch_server --model-path=/data/models/glm4.5-air-hf/ --trust-remote-code --tp=8
  • logs
root@d8ac077bb820:/workspace# python3 -m sglang.launch_server --model-path=/data/models/glm4.5-air-hf/ --trust-remote-code --tp=8
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:42:56 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:42:56 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 08-11 16:42:57 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:07] server_args=ServerArgs(model_path='/data/models/glm4.5-air-hf/', tokenizer_path='/data/models/glm4.5-air-hf/', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', model_loader_extra_config='{}', trust_remote_code=True, context_length=None, is_embedding=False, enable_multimodal=None, revision=None, model_impl='auto', host='127.0.0.1', port=30000, skip_server_warmup=False, warmups=None, nccl_port=None, dtype='auto', quantization=None, quantization_param_path=None, kv_cache_dtype='auto', mem_fraction_static=0.779, max_running_requests=None, max_queued_requests=9223372036854775807, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='fcfs', schedule_conservativeness=1.0, cpu_offload_gb=0, page_size=1, hybrid_kvcache_ratio=None, swa_full_tokens_ratio=0.8, disable_hybrid_swa_memory=False, device='npu', tp_size=8, pp_size=1, max_micro_batch_size=None, stream_interval=1, stream_output=False, random_seed=909378885, constrained_json_whitespace_pattern=None, watchdog_timeout=300, dist_timeout=None, download_dir=None, base_gpu_id=0, gpu_id_step=1, sleep_on_idle=False, log_level='info', log_level_http=None, log_requests=False, log_requests_level=0, crash_dump_folder=None, show_time_cost=False, enable_metrics=False, enable_metrics_for_all_schedulers=False, bucket_time_to_first_token=None, bucket_inter_token_latency=None, bucket_e2e_request_latency=None, collect_tokens_histogram=False, decode_log_interval=40, enable_request_time_stats_logging=False, kv_events_config=None, api_key=None, served_model_name='/data/models/glm4.5-air-hf/', chat_template=None, completion_template=None, file_storage_path='sglang_storage', enable_cache_report=False, reasoning_parser=None, tool_call_parser=None, dp_size=1, load_balance_method='round_robin', dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', preferred_sampling_params=None, enable_lora=None, max_lora_rank=None, lora_target_modules=None, lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend=None, decode_attention_backend=None, prefill_attention_backend=None, sampling_backend='pytorch', grammar_backend='xgrammar', mm_attention_backend=None, speculative_algorithm=None, speculative_draft_model_path=None, speculative_num_steps=None, speculative_eagle_topk=None, speculative_num_draft_tokens=None, speculative_accept_threshold_single=1.0, speculative_accept_threshold_acc=1.0, speculative_token_map=None, ep_size=1, moe_a2a_backend=None, enable_flashinfer_cutlass_moe=False, enable_flashinfer_trtllm_moe=False, enable_flashinfer_allreduce_fusion=False, deepep_mode='auto', ep_num_redundant_experts=0, ep_dispatch_algorithm='static', init_expert_location='trivial', enable_eplb=False, eplb_algorithm='auto', eplb_rebalance_num_iterations=1000, eplb_rebalance_layers_per_chunk=None, expert_distribution_recorder_mode=None, expert_distribution_recorder_buffer_size=1000, enable_expert_distribution_metrics=False, deepep_config=None, moe_dense_tp_size=None, enable_hierarchical_cache=False, hicache_ratio=2.0, hicache_size=0, hicache_write_policy='write_through_selective', hicache_io_backend='kernel', hicache_mem_layout='layer_first', hicache_storage_backend=None, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, cuda_graph_max_bs=None, cuda_graph_bs=None, disable_cuda_graph=False, disable_cuda_graph_padding=False, enable_profile_cuda_graph=False, enable_cudagraph_gc=False, enable_nccl_nvls=False, enable_symm_mem=False, enable_tokenizer_batch_encode=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, enable_mscclpp=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_dp_lm_head=False, enable_two_batch_overlap=False, enable_torch_compile=False, torch_compile_max_bs=32, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, flashinfer_mla_disable_ragged=False, disable_shared_experts_fusion=False, disable_chunked_prefix_cache=False, disable_fast_image_processor=False, enable_return_hidden_states=False, enable_triton_kernel_moe=False, debug_tensor_dump_output_folder=None, debug_tensor_dump_input_file=None, debug_tensor_dump_inject=False, debug_tensor_dump_prefill_only=False, disaggregation_mode='null', disaggregation_transfer_backend='mooncake', disaggregation_bootstrap_port=8998, disaggregation_decode_tp=None, disaggregation_decode_dp=None, disaggregation_prefill_pp=1, disaggregation_ib_device=None, num_reserved_decode_tokens=512, pdlb_url=None, custom_weight_loader=[], weight_loader_disable_mmap=False, enable_pdmux=False, sm_group_num=3, enable_ep_moe=False, enable_deepep_moe=False)
[2025-08-11 16:43:08] Using default HuggingFace chat template with detected content format: string
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:19 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:19 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:19 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:19 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:19 [importing.py:53] Triton module has been replaced with a placeholder.
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:22 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
WARNING 08-11 16:43:22 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
[2025-08-11 16:43:24 TP0] Attention backend not explicitly specified. Use ascend backend by default.
[2025-08-11 16:43:24 TP0] Init torch distributed begin.
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:39 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:39 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:39 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:39 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:40 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:40 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 08-11 16:43:40 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:40 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:41 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:42 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
WARNING 08-11 16:43:42 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
WARNING 08-11 16:43:43 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:43 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:43 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:43 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:44 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:44 TP1] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
INFO 08-11 16:43:44 [importing.py:53] Triton module has been replaced with a placeholder.
[2025-08-11 16:43:44 TP3] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
INFO 08-11 16:43:45 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 08-11 16:43:45 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
WARNING 08-11 16:43:45 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:46 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:46 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
[2025-08-11 16:43:46 TP0] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
WARNING 08-11 16:43:46 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:47 TP2] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
WARNING 08-11 16:43:47 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
[2025-08-11 16:43:49 TP6] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:50 TP5] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
[2025-08-11 16:43:50 TP7] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
[2025-08-11 16:43:52 TP4] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
[2025-08-11 16:43:52 TP0] Init torch distributed ends. mem usage=0.00 GB
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:53 TP1] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP3] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP1] Using Transformers backend.
[2025-08-11 16:43:53 TP3] Using Transformers backend.
[2025-08-11 16:43:53 TP0] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP0] Load weight begin. avail mem=60.63 GB
[2025-08-11 16:43:53 TP0] Using Transformers backend.
[2025-08-11 16:43:53 TP5] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP5] Using Transformers backend.
[2025-08-11 16:43:53 TP6] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP6] Using Transformers backend.
[2025-08-11 16:43:53 TP7] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP7] Using Transformers backend.
[2025-08-11 16:43:53 TP2] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP2] Using Transformers backend.
[2025-08-11 16:43:53 TP4] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP4] Using Transformers backend.
[2025-08-11 16:43:58 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/scheduler.py", line 2421, in run_scheduler_process
    scheduler = Scheduler(
                ^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/scheduler.py", line 312, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 67, in __init__
    self.worker = TpModelWorker(
                  ^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/tp_worker.py", line 84, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 242, in __init__
    self.initialize(min_per_gpu_memory)
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 285, in initialize
    self.load_model()
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 643, in load_model
    self.model = get_model(
                 ^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_loader/loader.py", line 432, in load_model
    model = _initialize_model(
            ^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_loader/loader.py", line 174, in _initialize_model
    return model_class(
           ^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/transformers.py", line 158, in __init__
    self.model: PreTrainedModel = AutoModel.from_config(
                                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 453, in from_config
    return model_class._from_config(config, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 311, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2208, in _from_config
    model = cls(config, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 454, in __init__
    [GLM4MoEDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 454, in <listcomp>
    [GLM4MoEDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 152, in __init__
    self.mlp = GLM4MoESparseMoeBlock(config, layer_id=layer_idx)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 106, in __init__
    self.experts = nn.ModuleList([
                                 ^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 107, in <listcomp>
    GLM4MoEMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(config.n_routed_experts)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 40, in __init__
    self.gate_proj = nn.Linear(config.hidden_size, intermediate_size, bias=False)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 106, in __init__
    torch.empty((out_features, in_features), **factory_kwargs)
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_device.py", line 104, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: NPU out of memory. Tried to allocate 12.00 MiB (NPU 1; 60.97 GiB total capacity; 60.59 GiB already allocated; 60.59 GiB current active; 28.28 MiB free; 60.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

[2025-08-11 16:43:58] Received sigquit from a child process. It usually means the child failed.
Killed

@ping1jing2
Copy link
Collaborator

@iforgetmyname I am using 8 * Ascend 910B to deploy the GLM4.5-Air (106B) model, but it reports OOM, could you help have a check? thanks.

  • repro steps
# step1: launch docker
docker run -it --rm --shm-size 512g \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver  \
    -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons  \
    -v /usr/local/sbin/:/usr/local/sbin \
    -v /lib/modules:/lib/modules  \
    -v /data/models:/data/models \
    --device=/dev/davinci0 \
    --device=/dev/davinci1 \
    --device=/dev/davinci2 \
    --device=/dev/davinci3 \
    --device=/dev/davinci4 \
    --device=/dev/davinci5 \
    --device=/dev/davinci6 \
    --device=/dev/davinci7 \
    --device=/dev/davinci_manager \
    --device=/dev/devmm_svm \
    --device=/dev/hisi_hdc \
    --privileged=true \
    sglang-ascend:latest bash

# step2: prepare env
source /usr/local/Ascend/driver/bin/setenv.bash
source /usr/local/Ascend/ascend-toolkit/set_env.sh
pip3 install -U transformers==4.53.3

# step3: launch sglang server
python3 -m sglang.launch_server --model-path=/data/models/glm4.5-air-hf/ --trust-remote-code --tp=8
  • logs
root@d8ac077bb820:/workspace# python3 -m sglang.launch_server --model-path=/data/models/glm4.5-air-hf/ --trust-remote-code --tp=8
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:42:56 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:42:56 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 08-11 16:42:57 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:07] server_args=ServerArgs(model_path='/data/models/glm4.5-air-hf/', tokenizer_path='/data/models/glm4.5-air-hf/', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', model_loader_extra_config='{}', trust_remote_code=True, context_length=None, is_embedding=False, enable_multimodal=None, revision=None, model_impl='auto', host='127.0.0.1', port=30000, skip_server_warmup=False, warmups=None, nccl_port=None, dtype='auto', quantization=None, quantization_param_path=None, kv_cache_dtype='auto', mem_fraction_static=0.779, max_running_requests=None, max_queued_requests=9223372036854775807, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='fcfs', schedule_conservativeness=1.0, cpu_offload_gb=0, page_size=1, hybrid_kvcache_ratio=None, swa_full_tokens_ratio=0.8, disable_hybrid_swa_memory=False, device='npu', tp_size=8, pp_size=1, max_micro_batch_size=None, stream_interval=1, stream_output=False, random_seed=909378885, constrained_json_whitespace_pattern=None, watchdog_timeout=300, dist_timeout=None, download_dir=None, base_gpu_id=0, gpu_id_step=1, sleep_on_idle=False, log_level='info', log_level_http=None, log_requests=False, log_requests_level=0, crash_dump_folder=None, show_time_cost=False, enable_metrics=False, enable_metrics_for_all_schedulers=False, bucket_time_to_first_token=None, bucket_inter_token_latency=None, bucket_e2e_request_latency=None, collect_tokens_histogram=False, decode_log_interval=40, enable_request_time_stats_logging=False, kv_events_config=None, api_key=None, served_model_name='/data/models/glm4.5-air-hf/', chat_template=None, completion_template=None, file_storage_path='sglang_storage', enable_cache_report=False, reasoning_parser=None, tool_call_parser=None, dp_size=1, load_balance_method='round_robin', dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', preferred_sampling_params=None, enable_lora=None, max_lora_rank=None, lora_target_modules=None, lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend=None, decode_attention_backend=None, prefill_attention_backend=None, sampling_backend='pytorch', grammar_backend='xgrammar', mm_attention_backend=None, speculative_algorithm=None, speculative_draft_model_path=None, speculative_num_steps=None, speculative_eagle_topk=None, speculative_num_draft_tokens=None, speculative_accept_threshold_single=1.0, speculative_accept_threshold_acc=1.0, speculative_token_map=None, ep_size=1, moe_a2a_backend=None, enable_flashinfer_cutlass_moe=False, enable_flashinfer_trtllm_moe=False, enable_flashinfer_allreduce_fusion=False, deepep_mode='auto', ep_num_redundant_experts=0, ep_dispatch_algorithm='static', init_expert_location='trivial', enable_eplb=False, eplb_algorithm='auto', eplb_rebalance_num_iterations=1000, eplb_rebalance_layers_per_chunk=None, expert_distribution_recorder_mode=None, expert_distribution_recorder_buffer_size=1000, enable_expert_distribution_metrics=False, deepep_config=None, moe_dense_tp_size=None, enable_hierarchical_cache=False, hicache_ratio=2.0, hicache_size=0, hicache_write_policy='write_through_selective', hicache_io_backend='kernel', hicache_mem_layout='layer_first', hicache_storage_backend=None, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, cuda_graph_max_bs=None, cuda_graph_bs=None, disable_cuda_graph=False, disable_cuda_graph_padding=False, enable_profile_cuda_graph=False, enable_cudagraph_gc=False, enable_nccl_nvls=False, enable_symm_mem=False, enable_tokenizer_batch_encode=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, enable_mscclpp=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_dp_lm_head=False, enable_two_batch_overlap=False, enable_torch_compile=False, torch_compile_max_bs=32, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, flashinfer_mla_disable_ragged=False, disable_shared_experts_fusion=False, disable_chunked_prefix_cache=False, disable_fast_image_processor=False, enable_return_hidden_states=False, enable_triton_kernel_moe=False, debug_tensor_dump_output_folder=None, debug_tensor_dump_input_file=None, debug_tensor_dump_inject=False, debug_tensor_dump_prefill_only=False, disaggregation_mode='null', disaggregation_transfer_backend='mooncake', disaggregation_bootstrap_port=8998, disaggregation_decode_tp=None, disaggregation_decode_dp=None, disaggregation_prefill_pp=1, disaggregation_ib_device=None, num_reserved_decode_tokens=512, pdlb_url=None, custom_weight_loader=[], weight_loader_disable_mmap=False, enable_pdmux=False, sm_group_num=3, enable_ep_moe=False, enable_deepep_moe=False)
[2025-08-11 16:43:08] Using default HuggingFace chat template with detected content format: string
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:19 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:19 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:19 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:19 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:19 [importing.py:53] Triton module has been replaced with a placeholder.
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:22 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
WARNING 08-11 16:43:22 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
[2025-08-11 16:43:24 TP0] Attention backend not explicitly specified. Use ascend backend by default.
[2025-08-11 16:43:24 TP0] Init torch distributed begin.
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:39 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:39 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:39 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:39 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:40 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:40 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 08-11 16:43:40 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:40 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:41 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:42 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
WARNING 08-11 16:43:42 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
WARNING 08-11 16:43:43 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:43 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:43 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:43 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:44 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:44 TP1] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
INFO 08-11 16:43:44 [importing.py:53] Triton module has been replaced with a placeholder.
[2025-08-11 16:43:44 TP3] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
INFO 08-11 16:43:45 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 08-11 16:43:45 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
WARNING 08-11 16:43:45 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:46 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:46 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
[2025-08-11 16:43:46 TP0] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
WARNING 08-11 16:43:46 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:47 TP2] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
WARNING 08-11 16:43:47 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
[2025-08-11 16:43:49 TP6] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:50 TP5] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
[2025-08-11 16:43:50 TP7] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
[2025-08-11 16:43:52 TP4] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
[2025-08-11 16:43:52 TP0] Init torch distributed ends. mem usage=0.00 GB
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:53 TP1] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP3] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP1] Using Transformers backend.
[2025-08-11 16:43:53 TP3] Using Transformers backend.
[2025-08-11 16:43:53 TP0] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP0] Load weight begin. avail mem=60.63 GB
[2025-08-11 16:43:53 TP0] Using Transformers backend.
[2025-08-11 16:43:53 TP5] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP5] Using Transformers backend.
[2025-08-11 16:43:53 TP6] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP6] Using Transformers backend.
[2025-08-11 16:43:53 TP7] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP7] Using Transformers backend.
[2025-08-11 16:43:53 TP2] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP2] Using Transformers backend.
[2025-08-11 16:43:53 TP4] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP4] Using Transformers backend.
[2025-08-11 16:43:58 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/scheduler.py", line 2421, in run_scheduler_process
    scheduler = Scheduler(
                ^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/scheduler.py", line 312, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 67, in __init__
    self.worker = TpModelWorker(
                  ^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/tp_worker.py", line 84, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 242, in __init__
    self.initialize(min_per_gpu_memory)
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 285, in initialize
    self.load_model()
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 643, in load_model
    self.model = get_model(
                 ^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_loader/loader.py", line 432, in load_model
    model = _initialize_model(
            ^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_loader/loader.py", line 174, in _initialize_model
    return model_class(
           ^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/transformers.py", line 158, in __init__
    self.model: PreTrainedModel = AutoModel.from_config(
                                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 453, in from_config
    return model_class._from_config(config, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 311, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2208, in _from_config
    model = cls(config, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 454, in __init__
    [GLM4MoEDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 454, in <listcomp>
    [GLM4MoEDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 152, in __init__
    self.mlp = GLM4MoESparseMoeBlock(config, layer_id=layer_idx)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 106, in __init__
    self.experts = nn.ModuleList([
                                 ^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 107, in <listcomp>
    GLM4MoEMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(config.n_routed_experts)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 40, in __init__
    self.gate_proj = nn.Linear(config.hidden_size, intermediate_size, bias=False)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 106, in __init__
    torch.empty((out_features, in_features), **factory_kwargs)
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_device.py", line 104, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: NPU out of memory. Tried to allocate 12.00 MiB (NPU 1; 60.97 GiB total capacity; 60.59 GiB already allocated; 60.59 GiB current active; 28.28 MiB free; 60.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

[2025-08-11 16:43:58] Received sigquit from a child process. It usually means the child failed.
Killed

let me check it. if you don't use this image, will you also encounter the oom?

@iforgetmyname
Copy link
Collaborator

@iforgetmyname I am using 8 * Ascend 910B to deploy the GLM4.5-Air (106B) model, but it reports OOM, could you help have a check? thanks.

  • repro steps
# step1: launch docker
docker run -it --rm --shm-size 512g \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver  \
    -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons  \
    -v /usr/local/sbin/:/usr/local/sbin \
    -v /lib/modules:/lib/modules  \
    -v /data/models:/data/models \
    --device=/dev/davinci0 \
    --device=/dev/davinci1 \
    --device=/dev/davinci2 \
    --device=/dev/davinci3 \
    --device=/dev/davinci4 \
    --device=/dev/davinci5 \
    --device=/dev/davinci6 \
    --device=/dev/davinci7 \
    --device=/dev/davinci_manager \
    --device=/dev/devmm_svm \
    --device=/dev/hisi_hdc \
    --privileged=true \
    sglang-ascend:latest bash

# step2: prepare env
source /usr/local/Ascend/driver/bin/setenv.bash
source /usr/local/Ascend/ascend-toolkit/set_env.sh
pip3 install -U transformers==4.53.3

# step3: launch sglang server
python3 -m sglang.launch_server --model-path=/data/models/glm4.5-air-hf/ --trust-remote-code --tp=8
  • logs
root@d8ac077bb820:/workspace# python3 -m sglang.launch_server --model-path=/data/models/glm4.5-air-hf/ --trust-remote-code --tp=8
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:42:56 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:42:56 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 08-11 16:42:57 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:07] server_args=ServerArgs(model_path='/data/models/glm4.5-air-hf/', tokenizer_path='/data/models/glm4.5-air-hf/', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', model_loader_extra_config='{}', trust_remote_code=True, context_length=None, is_embedding=False, enable_multimodal=None, revision=None, model_impl='auto', host='127.0.0.1', port=30000, skip_server_warmup=False, warmups=None, nccl_port=None, dtype='auto', quantization=None, quantization_param_path=None, kv_cache_dtype='auto', mem_fraction_static=0.779, max_running_requests=None, max_queued_requests=9223372036854775807, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='fcfs', schedule_conservativeness=1.0, cpu_offload_gb=0, page_size=1, hybrid_kvcache_ratio=None, swa_full_tokens_ratio=0.8, disable_hybrid_swa_memory=False, device='npu', tp_size=8, pp_size=1, max_micro_batch_size=None, stream_interval=1, stream_output=False, random_seed=909378885, constrained_json_whitespace_pattern=None, watchdog_timeout=300, dist_timeout=None, download_dir=None, base_gpu_id=0, gpu_id_step=1, sleep_on_idle=False, log_level='info', log_level_http=None, log_requests=False, log_requests_level=0, crash_dump_folder=None, show_time_cost=False, enable_metrics=False, enable_metrics_for_all_schedulers=False, bucket_time_to_first_token=None, bucket_inter_token_latency=None, bucket_e2e_request_latency=None, collect_tokens_histogram=False, decode_log_interval=40, enable_request_time_stats_logging=False, kv_events_config=None, api_key=None, served_model_name='/data/models/glm4.5-air-hf/', chat_template=None, completion_template=None, file_storage_path='sglang_storage', enable_cache_report=False, reasoning_parser=None, tool_call_parser=None, dp_size=1, load_balance_method='round_robin', dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', preferred_sampling_params=None, enable_lora=None, max_lora_rank=None, lora_target_modules=None, lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend=None, decode_attention_backend=None, prefill_attention_backend=None, sampling_backend='pytorch', grammar_backend='xgrammar', mm_attention_backend=None, speculative_algorithm=None, speculative_draft_model_path=None, speculative_num_steps=None, speculative_eagle_topk=None, speculative_num_draft_tokens=None, speculative_accept_threshold_single=1.0, speculative_accept_threshold_acc=1.0, speculative_token_map=None, ep_size=1, moe_a2a_backend=None, enable_flashinfer_cutlass_moe=False, enable_flashinfer_trtllm_moe=False, enable_flashinfer_allreduce_fusion=False, deepep_mode='auto', ep_num_redundant_experts=0, ep_dispatch_algorithm='static', init_expert_location='trivial', enable_eplb=False, eplb_algorithm='auto', eplb_rebalance_num_iterations=1000, eplb_rebalance_layers_per_chunk=None, expert_distribution_recorder_mode=None, expert_distribution_recorder_buffer_size=1000, enable_expert_distribution_metrics=False, deepep_config=None, moe_dense_tp_size=None, enable_hierarchical_cache=False, hicache_ratio=2.0, hicache_size=0, hicache_write_policy='write_through_selective', hicache_io_backend='kernel', hicache_mem_layout='layer_first', hicache_storage_backend=None, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, cuda_graph_max_bs=None, cuda_graph_bs=None, disable_cuda_graph=False, disable_cuda_graph_padding=False, enable_profile_cuda_graph=False, enable_cudagraph_gc=False, enable_nccl_nvls=False, enable_symm_mem=False, enable_tokenizer_batch_encode=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, enable_mscclpp=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_dp_lm_head=False, enable_two_batch_overlap=False, enable_torch_compile=False, torch_compile_max_bs=32, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, flashinfer_mla_disable_ragged=False, disable_shared_experts_fusion=False, disable_chunked_prefix_cache=False, disable_fast_image_processor=False, enable_return_hidden_states=False, enable_triton_kernel_moe=False, debug_tensor_dump_output_folder=None, debug_tensor_dump_input_file=None, debug_tensor_dump_inject=False, debug_tensor_dump_prefill_only=False, disaggregation_mode='null', disaggregation_transfer_backend='mooncake', disaggregation_bootstrap_port=8998, disaggregation_decode_tp=None, disaggregation_decode_dp=None, disaggregation_prefill_pp=1, disaggregation_ib_device=None, num_reserved_decode_tokens=512, pdlb_url=None, custom_weight_loader=[], weight_loader_disable_mmap=False, enable_pdmux=False, sm_group_num=3, enable_ep_moe=False, enable_deepep_moe=False)
[2025-08-11 16:43:08] Using default HuggingFace chat template with detected content format: string
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:19 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:19 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:19 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:19 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:19 [importing.py:53] Triton module has been replaced with a placeholder.
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:20 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:20 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:21 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:22 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
WARNING 08-11 16:43:22 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
[2025-08-11 16:43:24 TP0] Attention backend not explicitly specified. Use ascend backend by default.
[2025-08-11 16:43:24 TP0] Init torch distributed begin.
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:39 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:39 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
INFO 08-11 16:43:39 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:39 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:40 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:40 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 08-11 16:43:40 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-11 16:43:40 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:41 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:42 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
WARNING 08-11 16:43:42 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
WARNING 08-11 16:43:43 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:43 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:43 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:43 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:44 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:44 TP1] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
INFO 08-11 16:43:44 [importing.py:53] Triton module has been replaced with a placeholder.
[2025-08-11 16:43:44 TP3] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
INFO 08-11 16:43:45 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 08-11 16:43:45 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
WARNING 08-11 16:43:45 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
INFO 08-11 16:43:46 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-11 16:43:46 [__init__.py:243] No platform detected, vLLM is running on UnspecifiedPlatform
[2025-08-11 16:43:46 TP0] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
WARNING 08-11 16:43:46 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:47 TP2] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
WARNING 08-11 16:43:47 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:42: UserWarning: Using kernels directly from vllm. This might lead to performance degradation or missing functionalities as certain kernels may not be optimized. 
  warnings.warn(
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
  warnings.warn(f"Only CUDA and HIP support AWQ currently.")
[2025-08-11 16:43:49 TP6] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:50 TP5] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
[2025-08-11 16:43:50 TP7] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
[2025-08-11 16:43:52 TP4] Failed to import from custom_ar with ModuleNotFoundError("No module named 'sgl_kernel'")
[2025-08-11 16:43:52 TP0] Init torch distributed ends. mem usage=0.00 GB
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/configs/compiler_config.py:74: UserWarning: The following torchair config or properties may not take effect or report error in max-autotune mode: 
  warnings.warn("The following torchair config or properties may not take effect or report " + \
[2025-08-11 16:43:53 TP1] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP3] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP1] Using Transformers backend.
[2025-08-11 16:43:53 TP3] Using Transformers backend.
[2025-08-11 16:43:53 TP0] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP0] Load weight begin. avail mem=60.63 GB
[2025-08-11 16:43:53 TP0] Using Transformers backend.
[2025-08-11 16:43:53 TP5] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP5] Using Transformers backend.
[2025-08-11 16:43:53 TP6] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP6] Using Transformers backend.
[2025-08-11 16:43:53 TP7] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP7] Using Transformers backend.
[2025-08-11 16:43:53 TP2] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP2] Using Transformers backend.
[2025-08-11 16:43:53 TP4] GLM4MoEForCausalLM has no SGLang implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
[2025-08-11 16:43:53 TP4] Using Transformers backend.
[2025-08-11 16:43:58 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/scheduler.py", line 2421, in run_scheduler_process
    scheduler = Scheduler(
                ^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/scheduler.py", line 312, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 67, in __init__
    self.worker = TpModelWorker(
                  ^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/tp_worker.py", line 84, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 242, in __init__
    self.initialize(min_per_gpu_memory)
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 285, in initialize
    self.load_model()
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 643, in load_model
    self.model = get_model(
                 ^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_loader/loader.py", line 432, in load_model
    model = _initialize_model(
            ^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_loader/loader.py", line 174, in _initialize_model
    return model_class(
           ^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/transformers.py", line 158, in __init__
    self.model: PreTrainedModel = AutoModel.from_config(
                                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 453, in from_config
    return model_class._from_config(config, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 311, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2208, in _from_config
    model = cls(config, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 454, in __init__
    [GLM4MoEDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 454, in <listcomp>
    [GLM4MoEDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 152, in __init__
    self.mlp = GLM4MoESparseMoeBlock(config, layer_id=layer_idx)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 106, in __init__
    self.experts = nn.ModuleList([
                                 ^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 107, in <listcomp>
    GLM4MoEMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(config.n_routed_experts)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm4_moe.py", line 40, in __init__
    self.gate_proj = nn.Linear(config.hidden_size, intermediate_size, bias=False)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 106, in __init__
    torch.empty((out_features, in_features), **factory_kwargs)
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_device.py", line 104, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: NPU out of memory. Tried to allocate 12.00 MiB (NPU 1; 60.97 GiB total capacity; 60.59 GiB already allocated; 60.59 GiB current active; 28.28 MiB free; 60.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

[2025-08-11 16:43:58] Received sigquit from a child process. It usually means the child failed.
Killed

Hey @thincal could you please open an issue here so that we can track it?

@Alcanderian Alcanderian self-assigned this Aug 12, 2025
@Alcanderian Alcanderian added ready-to-merge The PR is ready to merge after the CI is green. npu labels Aug 12, 2025
@zhyncs zhyncs merged commit 2ecbd8b into sgl-project:main Aug 12, 2025
100 of 102 checks passed
narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025
Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>
Signed-off-by: lichaoran <pkwarcraft@gmail.com>
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025
Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>
Signed-off-by: lichaoran <pkwarcraft@gmail.com>
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

npu ready-to-merge The PR is ready to merge after the CI is green.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants