Add object store support to llm-d storage offloading#499
Add object store support to llm-d storage offloading#499effi-ofer wants to merge 11 commits intollm-d:mainfrom
Conversation
|
Unsigned commits detected! Please sign your commits. For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation. |
Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debugging Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com> debug Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
Signed-off-by: Effi Ofer <effi.ofer@gmail.com> add test_obj_backend Signed-off-by: Effi Ofer <effi.ofer@gmail.com> add certificate to boto3 Signed-off-by: Effi Ofer <effi.ofer@gmail.com> add certificate to boto3 Signed-off-by: Effi Ofer <effi.ofer@gmail.com> fix test_obj_backend.py Signed-off-by: Effi Ofer <effi.ofer@gmail.com> fix test_obj_backend.py Signed-off-by: Effi Ofer <effi.ofer@gmail.com> fix test_obj_backend.py Signed-off-by: Effi Ofer <effi.ofer@gmail.com> fix test_obj_backend.py Signed-off-by: Effi Ofer <effi.ofer@gmail.com> combine gds parameter with backend Signed-off-by: Effi Ofer <effi.ofer@gmail.com> combine gds parameter with backend Signed-off-by: Effi Ofer <effi.ofer@gmail.com> move nixl source to a new subdirectory Signed-off-by: Effi Ofer <effi.ofer@gmail.com> change dir nixl to llmd_nixl to avoid name collision with nixl library Signed-off-by: Effi Ofer <effi.ofer@gmail.com> change dir nixl to llmd_nixl to avoid name collision with nixl library Signed-off-by: Effi Ofer <effi.ofer@gmail.com> add nixl dependency Signed-off-by: Effi Ofer <effi.ofer@gmail.com> rm old nixl subdir Signed-off-by: Effi Ofer <effi.ofer@gmail.com> minor changes Signed-off-by: Effi Ofer <effi.ofer@gmail.com> fix wait logic Signed-off-by: Effi Ofer <effi.ofer@gmail.com> clean staging buffer release Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
| @@ -0,0 +1,129 @@ | |||
| # Copyright 2025 The llm-d Authors. | |||
There was a problem hiding this comment.
Can you rename the files to manager and worker like the fs
There was a problem hiding this comment.
It's not really a manager and worker. Rather it's encapsulation that was used to support the various nixl backends (posix, obj, gds). I can flatten it all down since we support only obj in the PR. But keeping the code in this form doesn't increase the complexity or decrease the readability of the code and enable us to very quickly add other nixl backends and run tests on them. Let me know what you think.
There was a problem hiding this comment.
I think the structure should be similar like storage backend and CPU backend, I don't see a reason this backend will be different
|
|
||
| def __init__(self, file_mapper: FileMapper) -> None: | ||
| LOOKUP_MODE_FILE = "file" | ||
| LOOKUP_MODE_OBJECT_STORE = "object_store" |
There was a problem hiding this comment.
I don't think that anything that related to object store should be in the file system
| tensors: List[torch.Tensor], | ||
| backend: str, | ||
| ): | ||
| assert len(tensors) == 1 # support only the cross layout |
There was a problem hiding this comment.
You should open an issue about that
| return | ||
| i = 0 | ||
| while True: | ||
| state = self.agent.check_xfer_state(entry.xfer_handle) |
There was a problem hiding this comment.
check_xfer_state is called unwrapped here, but get_finished wraps the same call in try/except nixlBackendError. Inconsistent — if it can raise in one place, it can raise in the other.
| while True: | ||
| state = self.agent.check_xfer_state(entry.xfer_handle) | ||
| if state == "DONE": | ||
| break | ||
| elif state == "PROC": | ||
| i += 1 | ||
| if i % 10 == 0: | ||
| self.logger.debug("wait_job %d iterations=%d", job_id, i) | ||
| time.sleep(0.1) |
There was a problem hiding this comment.
while True: time.sleep(0.1) with no upper bound — if NIXL's state machine wedges, this hangs forever. Consider a timeout: float | None argument so shutdown paths can bail out.
| results = self._pending_results | ||
| self._pending_results = [] |
There was a problem hiding this comment.
Tuple swap makes the intent obvious ("take what was there, replace with empty"):
results, self._pending_results = self._pending_results, []| flat_block_ids = [b for block_list in block_ids for b in block_list] | ||
| for (buf, _), block_id in zip(stagings, flat_block_ids): | ||
| self.tensors[0][block_id].copy_(buf[0], non_blocking=True) | ||
| self._h2d_stream.synchronize() |
There was a problem hiding this comment.
_complete_read blocks the polling loop. Line 110 does self._h2d_stream.synchronize() — called from get_finished() → _complete_transfer() → _complete_read(). If many blocks finish at once, every get_finished() call waits for all H2D copies on all just-completed jobs. Under load, this serializes completion. For an async engine, surprising.
| assert endpoint, "OBJ endpoint not set. Use --obj-endpoint or OBJ_ENDPOINT env var." | ||
| assert bucket, "OBJ bucket not set. Use --obj-bucket or OBJ_BUCKET env var." | ||
| assert access_key, "OBJ access_key not set. Use --obj-access-key or OBJ_ACCESS_KEY env var." | ||
| assert secret_key, "OBJ secret_key not set. Use --obj-secret-key or OBJ_SECRET_KEY env var." |
There was a problem hiding this comment.
make test runs pytest ./tests/ -v -sq across everything, so this test is picked up even when no S3 is available. The docstring says "Skips the entire session if required credentials are not provided," but assert raises AssertionError — pytest reports it as a failure, not a skip. Please use pytest.skip(...) instead:
if not endpoint:
pytest.skip("OBJ endpoint not set — skipping S3 integration tests")
if not bucket:
pytest.skip("OBJ bucket not set")
# ...That way make test passes cleanly on dev machines without S3 and still exercises the path on CI where credentials are configured.
| assert get_result.transfer_time is not None and get_result.transfer_time > 0 | ||
| assert get_result.transfer_type == ("SHARED_STORAGE", "GPU") | ||
|
|
||
| assert_blocks_equal_cross(original, restored, read_block_ids) |
There was a problem hiding this comment.
Objects written to S3 are never cleaned up. Every test run leaves orphan objects under root_dir="kv-test", so the bucket grows without bound. Either add a fixture teardown that deletes objects by prefix, or use a unique per-run prefix (UUID/timestamp) so leftovers are at least separable.
| "secret_key": secret_key, | ||
| "scheme": scheme, | ||
| "ca_bundle": ca_bundle, | ||
| } |
There was a problem hiding this comment.
The module docstring says "Assumes the object store bucket already exists and is reachable," but nothing checks this. If the bucket doesn't exist the test fails cryptically deep in NIXL. A one-line head_bucket probe inside obj_config that calls pytest.skip(...) on failure would surface the problem clearly (and also covers the case where the endpoint is configured but unreachable).
| parser.addoption("--obj-access-key", default=None) | ||
| parser.addoption("--obj-secret-key", default=None) | ||
| parser.addoption("--obj-scheme", default=None) | ||
| parser.addoption("--obj-ca_bundle", default=None) |
There was a problem hiding this comment.
Naming inconsistency: other options use hyphens (--obj-access-key, --obj-secret-key) but this one mixes a hyphen then an underscore (--obj-ca_bundle). Suggest --obj-ca-bundle for consistency.
This PR adds object store backend to the llm-d storage offloading. It does this by introducing a "backend" configuration parameter which can be set as follows, with POSIX as the default.
The object store access is implemented using NIXL (for PUT/GET) and boto3 (for lookup).
Documentation:
The full configuration of the object store is described in docs/object_store.md.
Testing: