[PD Disaggregation] remove splitwise deployment on single node and refine the code#4891
Merged
Jiang-Jia-Jun merged 6 commits intoPaddlePaddle:developfrom Nov 14, 2025
Merged
Conversation
|
Thanks for your contribution! |
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the splitwise (Prefill-Decode disaggregated) deployment architecture by removing the deprecated single-machine deployment mode (v2) and consolidating to two supported methods: v0 using splitwise_scheduler/dp_scheduler, and v1 using local_scheduler with router. The changes simplify the codebase by removing redundant code paths and improving the separation of concerns between prefill and decode instances.
- Removed deprecated
innode_prefill_portsparameter and associated single-machine PD logic - Refactored decode instance's request processing into a cleaner
_decode_process_splitwise_requestsfunction - Updated test utilities to share common functions and removed duplicated port/service management code
- Consolidated splitwise version naming from v0/v1/v2 to just v0/v1
Reviewed Changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e/utils/serving_utils.py | Added shared utility functions check_service_health and get_registered_number for e2e tests |
| tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py | Updated to use shared utilities, added redis support, removed duplicated helper functions |
| tests/e2e/test_ernie_03b_pd_router_v0.py | Refactored to import utilities from serving_utils instead of duplicating code |
| fastdeploy/worker/worker_process.py | Removed automatic ENABLE_V1_KVCACHE_SCHEDULER=0 setting for non-RDMA splitwise |
| fastdeploy/splitwise/splitwise_connector.py | Removed deprecated innode dispatch methods (has_splitwise_tasks, dispatch_innode_splitwise_tasks) |
| fastdeploy/output/token_processor.py | Added timeout warning for cache sending, removed v0-specific result filtering |
| fastdeploy/inter_communicator/engine_worker_queue.py | Removed available_prefill_instances queue that was used for single-machine coordination |
| fastdeploy/engine/request.py | Added timestamp fields inference_start_time and llm_engine_recv_req_timestamp to Request class |
| fastdeploy/engine/engine.py | Removed initialization of available_prefill_instances queue |
| fastdeploy/engine/common_engine.py | Major refactoring: extracted _insert_prefilled_requests, renamed _process_splitwise_task to _decode_process_splitwise_requests with cleaner logic |
| fastdeploy/engine/async_llm.py | Removed available_prefill_instances queue initialization |
| fastdeploy/engine/args_utils.py | Removed innode_prefill_ports argument, added validation for splitwise configuration |
| fastdeploy/demo/offline_disaggregated_demo.py | Deleted deprecated single-machine offline demo |
| fastdeploy/config.py | Simplified splitwise version detection logic (v0/v1 only), removed innode_prefill_ports config |
| fastdeploy/cache_manager/transfer_factory/ipc_cache_transfer.py | Removed unused finish_event variable |
| fastdeploy/cache_manager/cache_messager.py | Fixed connection status logging logic |
| examples/splitwise/start_v2_tp1.sh | Removed deprecated v2 single-machine example script |
| examples/splitwise/start_v1_tp*.sh | Updated scripts to use router-based v1 deployment method |
| examples/splitwise/start_v0_tp*.sh | Updated scripts to use splitwise_scheduler for v0 deployment |
| examples/splitwise/start_mixed.sh | Added test request example for mixed server deployment |
| docs/zh/features/disaggregated.md | Removed single-machine deployment documentation, updated multi-machine examples |
| docs/features/disaggregated.md | Removed single-machine deployment documentation, updated multi-machine examples |
Comments suppressed due to low confidence (4)
tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py:337
- Unused variable assignment. The variable
p_urlis assigned on line 336 but the function sends the request top_urlinstead ofd_urlon line 337. If the intention was to test both URLs, clarify with a comment. If only prefill URL is needed, remove thed_urlassignment.
tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py:366 - Inconsistent variable usage pattern. Lines 365-366 and 416-417 follow the pattern of unpacking both URLs but only using one. Similarly, lines 389-390 do the same. For better clarity, consider using tuple unpacking with underscore for unused values:
p_url, _ = api_urlwhen only using one URL.
tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py:164 - Typo in variable assignment. There's a typo on line 164:
env_prefillis incorrectly used instead ofenv_decode. This should beenv_decode["ENABLE_V1_KVCACHE_SCHEDULER"] = "0"to properly set the environment variable for the decode instance.
tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py:249 - Inconsistent spacing in f-string expression. Line 249 has
FD_API_PORT+1without spaces around the+operator, while line 249's first URL uses proper spacing. For consistency, useFD_API_PORT + 1.
678f629 to
9cc4a3f
Compare
Jiang-Jia-Jun
approved these changes
Nov 14, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
remove splitwise deployment on single node and refine the code
Modifications
Usage or Command
Refer to examples.
Accuracy Tests
Use ci to test accuracy.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.