[XPU][CI] change VL model to 28B-VL-thinking#5169
[XPU][CI] change VL model to 28B-VL-thinking#5169plusNew001 merged 8 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull Request Overview
This PR updates the XPU CI configuration script with changes to model parameters, server capabilities, and process management for the ERNIE-4.5-VL model testing.
Key Changes:
- Enhanced process cleanup with additional netstat-based port killing mechanism
- Enabled prefix caching and chunked prefill features for improved performance
- Migrated from ERNIE-4.5-VL-28B-A3B-Paddle to ERNIE-4.5-VL-28B-A3B-Thinking model with updated configuration parameters
| --model ${MODEL_PATH}/ERNIE-4.5-VL-28B-A3B-Thinking \ | ||
| --port $port_num \ | ||
| --engine-worker-queue-port $((port_num + 1)) \ | ||
| --metrics-port $((port_num + 2)) \ | ||
| --cache-queue-port $((port_num + 47873)) \ | ||
| --tensor-parallel-size 4 \ | ||
| --max-model-len 32768 \ | ||
| --max-num-seqs 10 \ | ||
| --max-num-seqs 32 \ | ||
| --quantization wint8 \ | ||
| --enable-mm \ | ||
| --mm-processor-kwargs '{"video_max_frames": 30}' \ | ||
| --limit-mm-per-prompt '{"image": 10, "video": 3}' \ | ||
| --reasoning-parser ernie-45-vl \ | ||
| --reasoning-parser ernie-45-vl-thinking \ | ||
| --tool-call-parser ernie-45-vl-thinking \ | ||
| --mm-processor-kwargs '{"image_max_pixels": 12845056 }' \ | ||
| --enable-chunked-prefill > server.log 2>&1 & |
There was a problem hiding this comment.
The model path has changed from ERNIE-4.5-VL-28B-A3B-Paddle to ERNIE-4.5-VL-28B-A3B-Thinking, and the --enable-mm flag has been removed (previously on line 243), along with the --limit-mm-per-prompt configuration. These are significant changes that affect multimodal capabilities. Consider adding a comment explaining:
- Why the model was changed
- Whether the new model has different multimodal capabilities
- If
--enable-mmis no longer needed or if multimodal is enabled by default in the new model
This will help future maintainers understand the reasoning behind these configuration changes.
| --reasoning-parser ernie-45-vl \ | ||
| --reasoning-parser ernie-45-vl-thinking \ | ||
| --tool-call-parser ernie-45-vl-thinking \ | ||
| --mm-processor-kwargs '{"image_max_pixels": 12845056 }' \ |
There was a problem hiding this comment.
The mm-processor-kwargs has changed from {"video_max_frames": 30} to {"image_max_pixels": 12845056}, shifting from video processing to image processing configuration. This appears to be a significant change in functionality, but there's no magic number documentation. Consider:
- Adding a comment explaining what
12845056pixels represents (e.g., resolution dimensions) - Defining this as a constant with a descriptive name
- Documenting why video processing support was removed
Example:
# 3584x3584 = 12845056 pixels - maximum image resolution for ERNIE-4.5-VL-Thinking model
--mm-processor-kwargs '{"image_max_pixels": 12845056}'| --tensor-parallel-size 4 \ | ||
| --max-model-len 32768 \ | ||
| --max-num-seqs 10 \ | ||
| --max-num-seqs 32 \ |
There was a problem hiding this comment.
The --max-num-seqs parameter has been increased from 10 to 32 (3.2x increase). This will increase memory usage and could impact performance on XPU devices. Ensure that:
- The XPU device has sufficient memory for 32 concurrent sequences with the new model
- This change has been tested under load to verify it doesn't cause OOM errors or performance degradation
- The CI environment is configured to handle this increased capacity
Consider documenting the rationale for this specific value in the CI configuration.
| for port in $(seq $((8188 + XPU_ID * 100 + 10)) $((8188 + XPU_ID * 100 + 40))); do | ||
| lsof -t -i :${port} | xargs kill -9 || true | ||
| done | ||
| netstat -tunlp 2>/dev/null | grep $((8190 + XPU_ID * 100)) | awk '{print $NF}' | awk -F'/' '{print $1}' | xargs -r kill -9 |
There was a problem hiding this comment.
This command has potential security risks. The netstat output parsing and kill -9 execution could fail silently or kill unintended processes if the port calculation results in a port used by other services. Consider:
- Adding error checking before the kill command
- Verifying the process belongs to the expected service before killing it
- Using more specific process identification (e.g., by process name and port together)
Example:
pids=$(netstat -tunlp 2>/dev/null | grep $((8190 + XPU_ID * 100)) | awk '{print $NF}' | awk -F'/' '{print $1}')
if [ -n "$pids" ]; then
echo "Killing processes on port $((8190 + XPU_ID * 100)): $pids"
echo "$pids" | xargs -r kill -9
fi| netstat -tunlp 2>/dev/null | grep $((8190 + XPU_ID * 100)) | awk '{print $NF}' | awk -F'/' '{print $1}' | xargs -r kill -9 | |
| # Only kill processes on the port that match expected names | |
| PORT=$((8190 + XPU_ID * 100)) | |
| EXPECTED_NAMES="cache_transfer_manager.py|api_server" | |
| lsof -nP -i :${PORT} | awk 'NR>1 {print $2,$1}' | while read pid cmd; do | |
| if ps -p $pid -o cmd= | grep -E "${EXPECTED_NAMES}" >/dev/null; then | |
| echo "Killing process $pid ($cmd) on port ${PORT}" | |
| kill -9 $pid | |
| else | |
| echo "Skipping process $pid ($cmd) on port ${PORT} (unexpected name)" | |
| fi | |
| done |
| for port in $(seq $((8188 + XPU_ID * 100 + 10)) $((8188 + XPU_ID * 100 + 40))); do | ||
| lsof -t -i :${port} | xargs kill -9 || true | ||
| done | ||
| netstat -tunlp 2>/dev/null | grep $((8190 + XPU_ID * 100)) | awk '{print $NF}' | awk -F'/' '{print $1}' | xargs -r kill -9 |
There was a problem hiding this comment.
The port number 8190 + XPU_ID * 100 appears to be hardcoded and differs from the pattern used in lines 14-16 (8188 + XPU_ID * 100 + offset). This inconsistency could lead to maintenance issues. Consider:
- Using a variable to define the base port calculation
- Adding a comment explaining why port 8190 needs separate handling
- Ensuring this port is consistently used throughout the script
Example:
# Kill processes on the metrics port (8190 + offset)
local metrics_port=$((8190 + XPU_ID * 100))
netstat -tunlp 2>/dev/null | grep $metrics_port | awk '{print $NF}' | awk -F'/' '{print $1}' | xargs -r kill -9
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #5169 +/- ##
==========================================
Coverage ? 57.68%
==========================================
Files ? 317
Lines ? 38445
Branches ? 5762
==========================================
Hits ? 22176
Misses ? 14493
Partials ? 1776
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.