[PD] Support KV transfer with MORI-IO#14626
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
* add disable notif * send aux with tcp * remove unused log --------- Co-authored-by: cwortman-amd <cwortman@amd.com>
b819d55 to
05bd4c5
Compare
| if self.kv_args.ib_device: | ||
| os.environ["MORI_RDMA_DEVICES"] = self.kv_args.ib_device | ||
|
|
||
| port = get_free_port() |
There was a problem hiding this comment.
This util is not robust, which might cause port conflict in some situations. Is it possible to borrow some idea from get_zmq_socket_on_host?
There was a problem hiding this comment.
Thank you! Fixed by:
- Using
port=0to let the OS atomically bind an available port - Retrieve the actual bound port from Mori's TCP stack
This eliminates the race condition entirely. Thanks for catching this!
ShangmingCai
left a comment
There was a problem hiding this comment.
This PR seems very complete. Would it be convenient to add a test to AMD's CI to verify the correctness?
aee7c7c to
d660c2d
Compare
|
/tag-and-rerun-ci |
@ShangmingCai I added a |
@maning00 I prefer 2, we can add the E2E test later. Do you mind moving this test to the manual dir? Most of the disaggregation tests are in the |
4c23f30 to
ce368a1
Compare
91017ce to
4a1a400
Compare
4a1a400 to
cef2141
Compare
|
/tag-and-rerun-ci |
@ShangmingCai I have moved the tests to |
@HaiShaw Done. I have incorporated the MORI-related setup into To ensure the default image remains unaffected, I added two build arguments:
|
|
/rerun-failed-ci |
@Lzy17 @yctseng0211 @bingxche Please help to enable MoRO-IO PD/D CI tests. |
HaiShaw
left a comment
There was a problem hiding this comment.
Is --page_size > 1 supported?
Please also add description for tuning following variables:
SGLANG_MORI_QP_PER_TRANSFER: Number of queue pairs per transfer (default: 1)
SGLANG_MORI_POST_BATCH_SIZE: RDMA post batch size (default: -1)
SGLANG_MORI_NUM_WORKERS: Number of worker threads (default: 1)
docker/rocm.Dockerfile
Outdated
| export USE_IONIC="OFF"; \ | ||
| export USE_BNXT="ON"; \ | ||
| echo "[MORI] NIC_BACKEND=bnxt: USE_BNXT=ON. Add Broadcom bnxt packages/repos here when available."; \ | ||
| ;; \ |
There was a problem hiding this comment.
When we can have bnxt support added here?
cc @Lzy17
There was a problem hiding this comment.
Update: To ensure full functionality for mori (io and ep), BRCM support will be integrated later once IBGDA support is fully available in the official library.
There was a problem hiding this comment.
Do you mean mori yet to support BRCM rdma-core?
There was a problem hiding this comment.
No, that is not the case. This is mainly because mori-ep and mori-io are built simultaneously. mori-ep currently depends on a pre-release version of the BRCM library (for IBGDA).
| git clone "${MORI_REPO}" /sgl-workspace/mori; \ | ||
| cd /sgl-workspace/mori; \ | ||
| git checkout "${MORI_COMMIT}"; \ | ||
| git submodule update --init --recursive; \ |
There was a problem hiding this comment.
Need requirements.txt check?
There was a problem hiding this comment.
No need to check here; the only dependency is torch.
There was a problem hiding this comment.
@kkHuang-amd let's keep an eye on this onwards
|
@maning00 Please add a basic accuracy test (gsm8k, etc.) on DPSK from 1P1D. |
Sure, it is supported. |
Added gsm8k test results |
HaiShaw
left a comment
There was a problem hiding this comment.
Should add BXNT, etc. support later - w.r.t. NIC_BACKEND
docker/rocm.Dockerfile
Outdated
| rm -rf /var/lib/apt/lists/*; \ | ||
| ;; \ | ||
| *) \ | ||
| echo "ERROR: unknown NIC_BACKEND=${NIC_BACKEND}. Use one of: none, ainic, bnxt"; \ |
There was a problem hiding this comment.
where bnxt is handled, or should we change this echo message?
Co-authored-by: cwortman-amd <cwortman@amd.com>
Co-authored-by: cwortman-amd <cwortman@amd.com>
Co-authored-by: cwortman-amd <cwortman@amd.com>
Co-authored-by: cwortman-amd <cwortman@amd.com>
Motivation
MORI-IO is AMD's high-performance, point-to-point communication library that leverages GDR (GPU Direct RDMA) to achieve ultra-low latency and high bandwidth for KVCache transfer in LLM inference. To enable efficient PD (Prefill-Decode) disaggregation on AMD hardware, we adopt MORI-IO transfer engine as the transport layer for SGLang.
Modifications
Architecture Overview
The implementation follows a similar pattern to the mooncake transfer engine integration, with MORI-IO-specific optimizations:
1. MoriKVManager - Core Transfer Management
Initialization:
IOEnginewith RDMA backend configurationConfiguration Options (via environment variables):
SGLANG_MORI_QP_PER_TRANSFER: Number of queue pairs per transfer (default: 1)SGLANG_MORI_POST_BATCH_SIZE: RDMA post batch size (default: -1)SGLANG_MORI_NUM_WORKERS: Number of worker threads (default: 1)2. MoriKVSender (Prefill Side)
batch_writeAPI for KV cache transfer3. MoriKVReceiver (Decode Side)
4. Dockerfile
Add the
NIC_BACKENDoption to enable mori support for different network interface cards (NICs).Usage
Installation: Install MORI-IO library following the MORI installation guide:
SGLang PD Disaggregation with MORI-IO: Use
--disaggregation-transfer-backend morito enable MORI-IO transfer engine:Known Limitations
State data transfer not implemented: Currently, MORI-IO implementation does not support state data transfer for hybrid models (Mamba, SWA, NSA).
Benchmarking and Profiling
End-to-End PD Disaggregation
Hardware Configuration:
Benchmark Command:
Prefill instance (node 1):
Decode instance (node 2):
Router (node 3):
Benchmark client:
Accuracy test:
python3 -m sglang.test.few_shot_gsm8k \ --host http://127.0.0.1 \ --port 8000 \ --num-questions 200 \ --parallel 128 \ --num-shots 5Performance Results:
Comparison of STANDALONE (no PD disaggregation) vs MORI vs MOONCAKE backends. Each test was run 3 times and averaged.
FP8 KV Cache Results(
--kv-cache-dtype fp8_e4m3):gsm8k Accuracy Test Results:
Both MORI and MOONCAKE leverage RDMA effectively, with near-identical performance profiles, validating MORI-IO as a production-ready alternative for AMD hardware.
cc @inkcherry @TianDi101 @Duyi-Wang