[PD Disaggregation] remove splitwise deployment on single node and refine the code by juncaipeng · Pull Request #4891 · PaddlePaddle/FastDeploy

juncaipeng · 2025-11-07T09:49:13Z

Motivation

remove splitwise deployment on single node and refine the code

Modifications

remove the code, example and doc of splitwise deployment on single node
splitwise deployment support v1 scheduler (must set --num-gpu-blocks-override in prefill)
refine the function in fastdeploy/engine/common_engine.py such as _decode_process_splitwise_requests and _insert_prefilled_requests
add test for splitwise deployment with splitwise_scheduler

Usage or Command

Refer to examples.

Accuracy Tests

Use ci to test accuracy.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-07T09:49:21Z

Thanks for your contribution!

Copilot

Pull Request Overview

This PR refactors the splitwise (Prefill-Decode disaggregated) deployment architecture by removing the deprecated single-machine deployment mode (v2) and consolidating to two supported methods: v0 using splitwise_scheduler/dp_scheduler, and v1 using local_scheduler with router. The changes simplify the codebase by removing redundant code paths and improving the separation of concerns between prefill and decode instances.

Removed deprecated innode_prefill_ports parameter and associated single-machine PD logic
Refactored decode instance's request processing into a cleaner _decode_process_splitwise_requests function
Updated test utilities to share common functions and removed duplicated port/service management code
Consolidated splitwise version naming from v0/v1/v2 to just v0/v1

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
tests/e2e/utils/serving_utils.py	Added shared utility functions `check_service_health` and `get_registered_number` for e2e tests
tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py	Updated to use shared utilities, added redis support, removed duplicated helper functions
tests/e2e/test_ernie_03b_pd_router_v0.py	Refactored to import utilities from serving_utils instead of duplicating code
fastdeploy/worker/worker_process.py	Removed automatic ENABLE_V1_KVCACHE_SCHEDULER=0 setting for non-RDMA splitwise
fastdeploy/splitwise/splitwise_connector.py	Removed deprecated innode dispatch methods (`has_splitwise_tasks`, `dispatch_innode_splitwise_tasks`)
fastdeploy/output/token_processor.py	Added timeout warning for cache sending, removed v0-specific result filtering
fastdeploy/inter_communicator/engine_worker_queue.py	Removed `available_prefill_instances` queue that was used for single-machine coordination
fastdeploy/engine/request.py	Added timestamp fields `inference_start_time` and `llm_engine_recv_req_timestamp` to Request class
fastdeploy/engine/engine.py	Removed initialization of `available_prefill_instances` queue
fastdeploy/engine/common_engine.py	Major refactoring: extracted `_insert_prefilled_requests`, renamed `_process_splitwise_task` to `_decode_process_splitwise_requests` with cleaner logic
fastdeploy/engine/async_llm.py	Removed `available_prefill_instances` queue initialization
fastdeploy/engine/args_utils.py	Removed `innode_prefill_ports` argument, added validation for splitwise configuration
fastdeploy/demo/offline_disaggregated_demo.py	Deleted deprecated single-machine offline demo
fastdeploy/config.py	Simplified splitwise version detection logic (v0/v1 only), removed `innode_prefill_ports` config
fastdeploy/cache_manager/transfer_factory/ipc_cache_transfer.py	Removed unused `finish_event` variable
fastdeploy/cache_manager/cache_messager.py	Fixed connection status logging logic
examples/splitwise/start_v2_tp1.sh	Removed deprecated v2 single-machine example script
examples/splitwise/start_v1_tp*.sh	Updated scripts to use router-based v1 deployment method
examples/splitwise/start_v0_tp*.sh	Updated scripts to use splitwise_scheduler for v0 deployment
examples/splitwise/start_mixed.sh	Added test request example for mixed server deployment
docs/zh/features/disaggregated.md	Removed single-machine deployment documentation, updated multi-machine examples
docs/features/disaggregated.md	Removed single-machine deployment documentation, updated multi-machine examples

Comments suppressed due to low confidence (4)

tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py:337

Unused variable assignment. The variable p_url is assigned on line 336 but the function sends the request to p_url instead of d_url on line 337. If the intention was to test both URLs, clarify with a comment. If only prefill URL is needed, remove the d_url assignment.
tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py:366
Inconsistent variable usage pattern. Lines 365-366 and 416-417 follow the pattern of unpacking both URLs but only using one. Similarly, lines 389-390 do the same. For better clarity, consider using tuple unpacking with underscore for unused values: p_url, _ = api_url when only using one URL.
tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py:164
Typo in variable assignment. There's a typo on line 164: env_prefill is incorrectly used instead of env_decode. This should be env_decode["ENABLE_V1_KVCACHE_SCHEDULER"] = "0" to properly set the environment variable for the decode instance.
tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py:249
Inconsistent spacing in f-string expression. Line 249 has FD_API_PORT+1 without spaces around the + operator, while line 249's first URL uses proper spacing. For consistency, use FD_API_PORT + 1.

fastdeploy/engine/common_engine.py

tests/e2e/utils/serving_utils.py

fastdeploy/engine/request.py

fastdeploy/cache_manager/cache_messager.py

fastdeploy/engine/args_utils.py

tests/e2e/utils/serving_utils.py

Copilot AI review requested due to automatic review settings November 7, 2025 09:49

Copilot AI reviewed Nov 7, 2025

View reviewed changes

juncaipeng force-pushed the pd_1 branch from 6df621a to fb87389 Compare November 10, 2025 03:20

juncaipeng changed the title ~~[PD] remove splitwise deployment on single node and refine the code~~ [PD Disaggregation] remove splitwise deployment on single node and refine the code Nov 10, 2025

juncaipeng force-pushed the pd_1 branch 2 times, most recently from 678f629 to 9cc4a3f Compare November 11, 2025 02:03

juncaipeng requested review from Jiang-Jia-Jun and rainyfly November 11, 2025 06:30

juncaipeng added 6 commits November 13, 2025 06:05

remove splitwise deployment on single node and refine the code

566ed79

up

469f068

up

280a0d8

up

162ecce

add test

02a391f

up

b93a7fe

juncaipeng force-pushed the pd_1 branch from 9cc4a3f to b93a7fe Compare November 13, 2025 06:16

Jiang-Jia-Jun approved these changes Nov 14, 2025

View reviewed changes

Jiang-Jia-Jun added the skip-ci: coverage label Nov 14, 2025

Jiang-Jia-Jun merged commit 36822fa into PaddlePaddle:develop Nov 14, 2025
13 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[PD Disaggregation] remove splitwise deployment on single node and refine the code#4891

[PD Disaggregation] remove splitwise deployment on single node and refine the code#4891
Jiang-Jia-Jun merged 6 commits intoPaddlePaddle:developfrom
juncaipeng:pd_1

juncaipeng commented Nov 7, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Nov 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

juncaipeng commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

juncaipeng commented Nov 7, 2025 •

edited

Loading