[XPU][Doc]Update XPU release2.3 note by iosmers · Pull Request #4939 · PaddlePaddle/FastDeploy

iosmers · 2025-11-11T02:56:28Z

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-11T02:56:34Z

Thanks for your contribution!

Copilot

Pull Request Overview

This PR updates XPU-related documentation with the following changes:

Downgrades PaddlePaddle-XPU version from 3.2.2 to 3.2.1 in installation guides
Adds missing --gpu-memory-utilization 0.7 parameter to ERNIE-4.5-VL-424B-A47B model configuration
Fixes code block indentation in deployment command examples

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
docs/zh/get_started/installation/kunlunxin_xpu.md	Updates PaddlePaddle-XPU installation version from 3.2.2 to 3.2.1
docs/get_started/installation/kunlunxin_xpu.md	Updates PaddlePaddle-XPU installation version from 3.2.2 to 3.2.1 (English version)
docs/zh/usage/kunlunxin_xpu_deployment.md	Adds gpu-memory-utilization parameter, fixes code formatting, and corrects spacing issues
docs/usage/kunlunxin_xpu_deployment.md	Adds gpu-memory-utilization parameter and fixes code formatting (English version)

Copilot · 2025-11-11T02:58:39Z

docs/zh/usage/kunlunxin_xpu_deployment.md

@@ -19,9 +19,9 @@
 |ERNIE-4.5-0.3B|128K|WINT8|1 （推荐）|export XPU_VISIBLE_DEVICES="0" # 指定任意一张卡<br>python -m fastdeploy.entrypoints.openai.api_server \ <br>    --model PaddlePaddle/ERNIE-4.5-0.3B-Paddle \ <br>    --port 8188 \ <br>    --tensor-parallel-size 1 \ <br>    --max-model-len 131072 \ <br>    --max-num-seqs 128 \ <br>    --quantization "wint8" \ <br>    --gpu-memory-utilization 0.9 \ <br>    --load-choices "default"|2.3.0|
 |ERNIE-4.5-300B-A47B-W4A8C8-TP4|32K|W4A8|4|export XPU_VISIBLE_DEVICES="0,1,2,3" or "4,5,6,7"<br>python -m fastdeploy.entrypoints.openai.api_server \ <br>    --model PaddlePaddle/ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle \ <br>    --port 8188 \ <br>    --tensor-parallel-size 4 \ <br>    --max-model-len 32768 \ <br>    --max-num-seqs 64 \ <br>    --quantization "W4A8" \ <br>    --gpu-memory-utilization 0.9 \ <br>    --load-choices "default"|2.3.0|
 |ERNIE-4.5-VL-28B-A3B|32K|WINT8|1|export XPU_VISIBLE_DEVICES="0"# 指定任意一张卡<br>python -m fastdeploy.entrypoints.openai.api_server \ <br>    --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Paddle \ <br>    --port 8188  \ <br> --tensor-parallel-size 1 \ <br> --quantization "wint8" \ <br>  --max-model-len 32768 \ <br> --max-num-seqs 10 \ <br>     --enable-mm \ <br>   --mm-processor-kwargs '{"video_max_frames": 30}' \ <br>     --limit-mm-per-prompt '{"image": 10, "video": 3}' \ <br>     --reasoning-parser ernie-45-vl \ <br>    --load-choices "default"|2.3.0|


Missing space after the closing quote in the export command. Should be:

export XPU_VISIBLE_DEVICES="0" # 指定任意一张卡

This is inconsistent with the corrected format in line 24 of the same table.

Suggested change

|ERNIE-4.5-VL-28B-A3B|32K|WINT8|1|export XPU_VISIBLE_DEVICES="0"# 指定任意一张卡 python -m fastdeploy.entrypoints.openai.api_server \ --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Paddle \ --port 8188 \ --tensor-parallel-size 1 \ --quantization "wint8" \ --max-model-len 32768 \ --max-num-seqs 10 \ --enable-mm \ --mm-processor-kwargs '{"video_max_frames": 30}' \ --limit-mm-per-prompt '{"image": 10, "video": 3}' \ --reasoning-parser ernie-45-vl \ --load-choices "default"|2.3.0|

|ERNIE-4.5-VL-28B-A3B|32K|WINT8|1|export XPU_VISIBLE_DEVICES="0" # 指定任意一张卡 python -m fastdeploy.entrypoints.openai.api_server \ --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Paddle \ --port 8188 \ --tensor-parallel-size 1 \ --quantization "wint8" \ --max-model-len 32768 \ --max-num-seqs 10 \ --enable-mm \ --mm-processor-kwargs '{"video_max_frames": 30}' \ --limit-mm-per-prompt '{"image": 10, "video": 3}' \ --reasoning-parser ernie-45-vl \ --load-choices "default"|2.3.0|

Copilot · 2025-11-11T02:58:39Z

docs/usage/kunlunxin_xpu_deployment.md

+|ERNIE-4.5-VL-424B-A47B|32K|WINT8|8|export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" <br>python -m fastdeploy.entrypoints.openai.api_server \ <br>    --model PaddlePaddle/ERNIE-4.5-VL-424B-A47B-Paddle \ <br>    --port 8188 \ <br> --tensor-parallel-size 8 \ <br> --quantization "wint8" \ <br>  --max-model-len 32768 \ <br> --max-num-seqs 8 \ <br>     --enable-mm \ <br>   --mm-processor-kwargs '{"video_max_frames": 30}' \ <br>     --limit-mm-per-prompt '{"image": 10, "video": 3}' \ <br>     --reasoning-parser ernie-45-vl \ <br> --gpu-memory-utilization 0.7 \ <br>  --load-choices "default"|2.3.0|
 |PaddleOCR-VL-0.9B|32K|BF16|1|export FD_ENABLE_MAX_PREFILL=1 <br>export XPU_VISIBLE_DEVICES="0" # Specify any card <br>python -m fastdeploy.entrypoints.openai.api_server \ <br>   --model PaddlePaddle/PaddleOCR-VL \ <br>  --port 8188 \ <br> --metrics-port 8181 \ <br> --engine-worker-queue-port 8182 \ <br> --max-model-len 16384 \ <br> --max-num-batched-tokens 16384 \ <br> --gpu-memory-utilization 0.8 \ <br> --max-num-seqs 256|2.3.0|
-|ERNIE-4.5-VL-28B-A3B-Thinking|128K|WINT8|1|export XPU_VISIBLE_DEVICES="0"# Specify any card<br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking \ <br> --port 8188 \ <br> --tensor-parallel-size 1 \ <br> --quantization "wint8" \ <br> --max-model-len 131072 \ <br> --max-num-seqs 32 \ <br> --engine-worker-queue-port 8189 \ <br> --metrics-port 8190 \ <br> --cache-queue-port 8191 \ <br> --reasoning-parser ernie-45-vl-thinking \ <br> --tool-call-parser ernie-45-vl-thinking \ <br> --mm-processor-kwargs '{"image_max_pixels": 12845056 }' <br> --load-choices "default_v1"|2.3.0|
+|ERNIE-4.5-VL-28B-A3B-Thinking|128K|WINT8|1|export XPU_VISIBLE_DEVICES="0"# Specify any card<br>python -m fastdeploy.entrypoints.openai.api_server \ <br> --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking \ <br> --port 8188 \ <br> --tensor-parallel-size 1 \ <br> --quantization "wint8" \ <br> --max-model-len 131072 \ <br> --max-num-seqs 32 \ <br> --engine-worker-queue-port 8189 \ <br> --metrics-port 8190 \ <br> --cache-queue-port 8191 \ <br> --reasoning-parser ernie-45-vl-thinking \ <br> --tool-call-parser ernie-45-vl-thinking \ <br> --mm-processor-kwargs '{"image_max_pixels": 12845056 }' \ <br> --load-choices "default_v1"|2.3.0|


Missing space after the closing quote in the export command. Should be:

export XPU_VISIBLE_DEVICES="0" # Specify any card

This is inconsistent with the corrected format in other rows of the same table.

Suggested change

|ERNIE-4.5-VL-28B-A3B-Thinking|128K|WINT8|1|export XPU_VISIBLE_DEVICES="0"# Specify any card python -m fastdeploy.entrypoints.openai.api_server \ --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking \ --port 8188 \ --tensor-parallel-size 1 \ --quantization "wint8" \ --max-model-len 131072 \ --max-num-seqs 32 \ --engine-worker-queue-port 8189 \ --metrics-port 8190 \ --cache-queue-port 8191 \ --reasoning-parser ernie-45-vl-thinking \ --tool-call-parser ernie-45-vl-thinking \ --mm-processor-kwargs '{"image_max_pixels": 12845056 }' \ --load-choices "default_v1"|2.3.0|

|ERNIE-4.5-VL-28B-A3B-Thinking|128K|WINT8|1|export XPU_VISIBLE_DEVICES="0" # Specify any card python -m fastdeploy.entrypoints.openai.api_server \ --model PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking \ --port 8188 \ --tensor-parallel-size 1 \ --quantization "wint8" \ --max-model-len 131072 \ --max-num-seqs 32 \ --engine-worker-queue-port 8189 \ --metrics-port 8190 \ --cache-queue-port 8191 \ --reasoning-parser ernie-45-vl-thinking \ --tool-call-parser ernie-45-vl-thinking \ --mm-processor-kwargs '{"image_max_pixels": 12845056 }' \ --load-choices "default_v1"|2.3.0|

Copilot · 2025-11-11T02:58:40Z

docs/zh/usage/kunlunxin_xpu_deployment.md

@@ -250,19 +250,19 @@ print('\n')
 ```bash
 export XPU_VISIBLE_DEVICES="0"# 指定任意一张卡


Missing space after the closing quote in the export command. Should be:

export XPU_VISIBLE_DEVICES="0" # 指定任意一张卡

Suggested change

export XPU_VISIBLE_DEVICES="0"# 指定任意一张卡

export XPU_VISIBLE_DEVICES="0" # 指定任意一张卡

Copilot · 2025-11-11T02:58:40Z

docs/usage/kunlunxin_xpu_deployment.md

@@ -247,19 +247,19 @@ Deploy the ERNIE-4.5-VL-28B-A3B-Thinking model with WINT8 precision and 128K con
 ```bash
 export XPU_VISIBLE_DEVICES="0"# Specify any card


Missing space after the closing quote in the export command. Should be:

export XPU_VISIBLE_DEVICES="0" # Specify any card

Suggested change

export XPU_VISIBLE_DEVICES="0"# Specify any card

export XPU_VISIBLE_DEVICES="0" # Specify any card

update doc

097d06a

Copilot AI review requested due to automatic review settings November 11, 2025 02:56

Copilot started reviewing on behalf of iosmers November 11, 2025 02:56 View session

Copilot finished reviewing on behalf of iosmers November 11, 2025 02:57

update

ca5d98c

Copilot AI reviewed Nov 11, 2025

View reviewed changes

iosmers added 2 commits November 11, 2025 03:20

update

775d215

udpate

5a25abc

iosmers changed the title ~~[XPU]update doc~~ [XPU][Doc]Update XPU release2.3 note Nov 11, 2025

EmmonsCurse approved these changes Nov 11, 2025

View reviewed changes

EmmonsCurse merged commit 215cda2 into PaddlePaddle:develop Nov 11, 2025
4 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[XPU][Doc]Update XPU release2.3 note #4939

[XPU][Doc]Update XPU release2.3 note #4939
EmmonsCurse merged 4 commits intoPaddlePaddle:developfrom
iosmers:develop

iosmers commented Nov 11, 2025

Uh oh!

paddle-bot bot commented Nov 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 11, 2025

Uh oh!

Copilot AI Nov 11, 2025

Uh oh!

Copilot AI Nov 11, 2025

Uh oh!

Copilot AI Nov 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -250,19 +250,19 @@ print('\n')
		```bash
		export XPU_VISIBLE_DEVICES="0"# 指定任意一张卡

	export XPU_VISIBLE_DEVICES="0"# 指定任意一张卡
	export XPU_VISIBLE_DEVICES="0" # 指定任意一张卡

		@@ -247,19 +247,19 @@ Deploy the ERNIE-4.5-VL-28B-A3B-Thinking model with WINT8 precision and 128K con
		```bash
		export XPU_VISIBLE_DEVICES="0"# Specify any card

	export XPU_VISIBLE_DEVICES="0"# Specify any card
	export XPU_VISIBLE_DEVICES="0" # Specify any card

Comments

Conversation

iosmers commented Nov 11, 2025

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants