Skip to content

Comments

[XPU][Doc]Update XPU release2.3 note #4939

Merged
EmmonsCurse merged 4 commits intoPaddlePaddle:developfrom
iosmers:develop
Nov 11, 2025
Merged

[XPU][Doc]Update XPU release2.3 note #4939
EmmonsCurse merged 4 commits intoPaddlePaddle:developfrom
iosmers:develop

Conversation

@iosmers
Copy link
Collaborator

@iosmers iosmers commented Nov 11, 2025

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings November 11, 2025 02:56
@paddle-bot
Copy link

paddle-bot bot commented Nov 11, 2025

Thanks for your contribution!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates XPU-related documentation with the following changes:

  • Downgrades PaddlePaddle-XPU version from 3.2.2 to 3.2.1 in installation guides
  • Adds missing --gpu-memory-utilization 0.7 parameter to ERNIE-4.5-VL-424B-A47B model configuration
  • Fixes code block indentation in deployment command examples

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
docs/zh/get_started/installation/kunlunxin_xpu.md Updates PaddlePaddle-XPU installation version from 3.2.2 to 3.2.1
docs/get_started/installation/kunlunxin_xpu.md Updates PaddlePaddle-XPU installation version from 3.2.2 to 3.2.1 (English version)
docs/zh/usage/kunlunxin_xpu_deployment.md Adds gpu-memory-utilization parameter, fixes code formatting, and corrects spacing issues
docs/usage/kunlunxin_xpu_deployment.md Adds gpu-memory-utilization parameter and fixes code formatting (English version)

@@ -19,9 +19,9 @@
|ERNIE-4.5-0.3B|128K|WINT8|1 (推荐)|export XPU_VISIBLE_DEVICES="0" # 指定任意一张卡<br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/ERNIE-4.5-0.3B-Paddle \ <br> --port 8188 \ <br> --tensor-parallel-size 1 \ <br> --max-model-len 131072 \ <br> --max-num-seqs 128 \ <br> --quantization "wint8" \ <br> --gpu-memory-utilization 0.9 \ <br> --load-choices "default"|2.3.0|
|ERNIE-4.5-300B-A47B-W4A8C8-TP4|32K|W4A8|4|export XPU_VISIBLE_DEVICES="0,1,2,3" or "4,5,6,7"<br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle \ <br> --port 8188 \ <br> --tensor-parallel-size 4 \ <br> --max-model-len 32768 \ <br> --max-num-seqs 64 \ <br> --quantization "W4A8" \ <br> --gpu-memory-utilization 0.9 \ <br> --load-choices "default"|2.3.0|
|ERNIE-4.5-VL-28B-A3B|32K|WINT8|1|export XPU_VISIBLE_DEVICES="0"# 指定任意一张卡<br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Paddle \ <br> --port 8188 \ <br> --tensor-parallel-size 1 \ <br> --quantization "wint8" \ <br> --max-model-len 32768 \ <br> --max-num-seqs 10 \ <br> --enable-mm \ <br> --mm-processor-kwargs '{"video_max_frames": 30}' \ <br> --limit-mm-per-prompt '{"image": 10, "video": 3}' \ <br> --reasoning-parser ernie-45-vl \ <br> --load-choices "default"|2.3.0|
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after the closing quote in the export command. Should be:

export XPU_VISIBLE_DEVICES="0" # 指定任意一张卡

This is inconsistent with the corrected format in line 24 of the same table.

Suggested change
|ERNIE-4.5-VL-28B-A3B|32K|WINT8|1|export XPU_VISIBLE_DEVICES="0"# 指定任意一张卡<br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Paddle \ <br> --port 8188 \ <br> --tensor-parallel-size 1 \ <br> --quantization "wint8" \ <br> --max-model-len 32768 \ <br> --max-num-seqs 10 \ <br> --enable-mm \ <br> --mm-processor-kwargs '{"video_max_frames": 30}' \ <br> --limit-mm-per-prompt '{"image": 10, "video": 3}' \ <br> --reasoning-parser ernie-45-vl \ <br> --load-choices "default"|2.3.0|
|ERNIE-4.5-VL-28B-A3B|32K|WINT8|1|export XPU_VISIBLE_DEVICES="0" # 指定任意一张卡<br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Paddle \ <br> --port 8188 \ <br> --tensor-parallel-size 1 \ <br> --quantization "wint8" \ <br> --max-model-len 32768 \ <br> --max-num-seqs 10 \ <br> --enable-mm \ <br> --mm-processor-kwargs '{"video_max_frames": 30}' \ <br> --limit-mm-per-prompt '{"image": 10, "video": 3}' \ <br> --reasoning-parser ernie-45-vl \ <br> --load-choices "default"|2.3.0|

Copilot uses AI. Check for mistakes.
|ERNIE-4.5-VL-424B-A47B|32K|WINT8|8|export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" <br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/ERNIE-4.5-VL-424B-A47B-Paddle \ <br> --port 8188 \ <br> --tensor-parallel-size 8 \ <br> --quantization "wint8" \ <br> --max-model-len 32768 \ <br> --max-num-seqs 8 \ <br> --enable-mm \ <br> --mm-processor-kwargs '{"video_max_frames": 30}' \ <br> --limit-mm-per-prompt '{"image": 10, "video": 3}' \ <br> --reasoning-parser ernie-45-vl \ <br> --gpu-memory-utilization 0.7 \ <br> --load-choices "default"|2.3.0|
|PaddleOCR-VL-0.9B|32K|BF16|1|export FD_ENABLE_MAX_PREFILL=1 <br>export XPU_VISIBLE_DEVICES="0" # Specify any card <br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/PaddleOCR-VL \ <br> --port 8188 \ <br> --metrics-port 8181 \ <br> --engine-worker-queue-port 8182 \ <br> --max-model-len 16384 \ <br> --max-num-batched-tokens 16384 \ <br> --gpu-memory-utilization 0.8 \ <br> --max-num-seqs 256|2.3.0|
|ERNIE-4.5-VL-28B-A3B-Thinking|128K|WINT8|1|export XPU_VISIBLE_DEVICES="0"# Specify any card<br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking \ <br> --port 8188 \ <br> --tensor-parallel-size 1 \ <br> --quantization "wint8" \ <br> --max-model-len 131072 \ <br> --max-num-seqs 32 \ <br> --engine-worker-queue-port 8189 \ <br> --metrics-port 8190 \ <br> --cache-queue-port 8191 \ <br> --reasoning-parser ernie-45-vl-thinking \ <br> --tool-call-parser ernie-45-vl-thinking \ <br> --mm-processor-kwargs '{"image_max_pixels": 12845056 }' <br> --load-choices "default_v1"|2.3.0|
|ERNIE-4.5-VL-28B-A3B-Thinking|128K|WINT8|1|export XPU_VISIBLE_DEVICES="0"# Specify any card<br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking \ <br> --port 8188 \ <br> --tensor-parallel-size 1 \ <br> --quantization "wint8" \ <br> --max-model-len 131072 \ <br> --max-num-seqs 32 \ <br> --engine-worker-queue-port 8189 \ <br> --metrics-port 8190 \ <br> --cache-queue-port 8191 \ <br> --reasoning-parser ernie-45-vl-thinking \ <br> --tool-call-parser ernie-45-vl-thinking \ <br> --mm-processor-kwargs '{"image_max_pixels": 12845056 }' \ <br> --load-choices "default_v1"|2.3.0|
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after the closing quote in the export command. Should be:

export XPU_VISIBLE_DEVICES="0" # Specify any card

This is inconsistent with the corrected format in other rows of the same table.

Suggested change
|ERNIE-4.5-VL-28B-A3B-Thinking|128K|WINT8|1|export XPU_VISIBLE_DEVICES="0"# Specify any card<br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking \ <br> --port 8188 \ <br> --tensor-parallel-size 1 \ <br> --quantization "wint8" \ <br> --max-model-len 131072 \ <br> --max-num-seqs 32 \ <br> --engine-worker-queue-port 8189 \ <br> --metrics-port 8190 \ <br> --cache-queue-port 8191 \ <br> --reasoning-parser ernie-45-vl-thinking \ <br> --tool-call-parser ernie-45-vl-thinking \ <br> --mm-processor-kwargs '{"image_max_pixels": 12845056 }' \ <br> --load-choices "default_v1"|2.3.0|
|ERNIE-4.5-VL-28B-A3B-Thinking|128K|WINT8|1|export XPU_VISIBLE_DEVICES="0" # Specify any card<br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking \ <br> --port 8188 \ <br> --tensor-parallel-size 1 \ <br> --quantization "wint8" \ <br> --max-model-len 131072 \ <br> --max-num-seqs 32 \ <br> --engine-worker-queue-port 8189 \ <br> --metrics-port 8190 \ <br> --cache-queue-port 8191 \ <br> --reasoning-parser ernie-45-vl-thinking \ <br> --tool-call-parser ernie-45-vl-thinking \ <br> --mm-processor-kwargs '{"image_max_pixels": 12845056 }' \ <br> --load-choices "default_v1"|2.3.0|

Copilot uses AI. Check for mistakes.
@@ -250,19 +250,19 @@ print('\n')
```bash
export XPU_VISIBLE_DEVICES="0"# 指定任意一张卡
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after the closing quote in the export command. Should be:

export XPU_VISIBLE_DEVICES="0" # 指定任意一张卡
Suggested change
export XPU_VISIBLE_DEVICES="0"# 指定任意一张卡
export XPU_VISIBLE_DEVICES="0" # 指定任意一张卡

Copilot uses AI. Check for mistakes.
@@ -247,19 +247,19 @@ Deploy the ERNIE-4.5-VL-28B-A3B-Thinking model with WINT8 precision and 128K con
```bash
export XPU_VISIBLE_DEVICES="0"# Specify any card
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after the closing quote in the export command. Should be:

export XPU_VISIBLE_DEVICES="0" # Specify any card
Suggested change
export XPU_VISIBLE_DEVICES="0"# Specify any card
export XPU_VISIBLE_DEVICES="0" # Specify any card

Copilot uses AI. Check for mistakes.
@iosmers iosmers changed the title [XPU]update doc [XPU][Doc]Update XPU release2.3 note Nov 11, 2025
@EmmonsCurse EmmonsCurse merged commit 215cda2 into PaddlePaddle:develop Nov 11, 2025
4 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants