Skip to content

Add object store support to llm-d storage offloading#499

Open
effi-ofer wants to merge 11 commits intollm-d:mainfrom
effi-ofer:nixl
Open

Add object store support to llm-d storage offloading#499
effi-ofer wants to merge 11 commits intollm-d:mainfrom
effi-ofer:nixl

Conversation

@effi-ofer
Copy link
Copy Markdown

This PR adds object store backend to the llm-d storage offloading. It does this by introducing a "backend" configuration parameter which can be set as follows, with POSIX as the default.

  "POSIX"           - POSIX engine, GDS disabled
  "POSIX_GDS_READ"  - GDS for reads only
  "POSIX_GDS_WRITE" - GDS for writes only
  "POSIX_GDS"       - GDS for reads and writes
  "POSIX_BB_READ"   - bounce-buffer GDS for reads only
  "POSIX_BB_WRITE"  - bounce-buffer GDS for writes only
  "POSIX_BB"        - bounce-buffer GDS for reads and writes
  "OBJ"             - NIXL S3 object store backend

The object store access is implemented using NIXL (for PUT/GET) and boto3 (for lookup).

Documentation:
The full configuration of the object store is described in docs/object_store.md.

Testing:

  • all the tests in the tests directories
  • added a new test_obj_backend test case

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

@github-actions github-actions bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 6, 2026
Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debugging

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

debug

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

add test_obj_backend

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

add certificate to boto3

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

add certificate to boto3

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

fix test_obj_backend.py

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

fix test_obj_backend.py

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

fix test_obj_backend.py

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

fix test_obj_backend.py

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

combine gds parameter with backend

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

combine gds parameter with backend

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

move nixl source to a new subdirectory

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

change dir nixl to llmd_nixl to avoid name collision with nixl library

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

change dir nixl to llmd_nixl to avoid name collision with nixl library

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

add nixl dependency

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

rm old nixl subdir

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

minor changes

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

fix wait logic

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>

clean staging buffer release

Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
Comment thread kv_connectors/llmd_fs_backend/docs/object_store.md
Comment thread kv_connectors/llmd_fs_backend/docs/object_store.md Outdated
Comment thread kv_connectors/llmd_fs_backend/docs/object_store.md Outdated
Comment thread kv_connectors/llmd_fs_backend/llmd_nixl/nixl_offload.py
Comment thread kv_connectors/llmd_fs_backend/llmd_fs_backend/spec.py Outdated
@@ -0,0 +1,129 @@
# Copyright 2025 The llm-d Authors.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename the files to manager and worker like the fs

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really a manager and worker. Rather it's encapsulation that was used to support the various nixl backends (posix, obj, gds). I can flatten it all down since we support only obj in the PR. But keeping the code in this form doesn't increase the complexity or decrease the readability of the code and enable us to very quickly add other nixl backends and run tests on them. Let me know what you think.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the structure should be similar like storage backend and CPU backend, I don't see a reason this backend will be different

Comment thread kv_connectors/llmd_fs_backend/README.md Outdated
Comment thread kv_connectors/llmd_fs_backend/llmd_fs_backend/worker.py Outdated
Comment thread kv_connectors/llmd_fs_backend/llmd_fs_backend/worker.py Outdated
Comment thread kv_connectors/llmd_fs_backend/llmd_nixl/nixl_offload.py Outdated

def __init__(self, file_mapper: FileMapper) -> None:
LOOKUP_MODE_FILE = "file"
LOOKUP_MODE_OBJECT_STORE = "object_store"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that anything that related to object store should be in the file system

tensors: List[torch.Tensor],
backend: str,
):
assert len(tensors) == 1 # support only the cross layout
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should open an issue about that

return
i = 0
while True:
state = self.agent.check_xfer_state(entry.xfer_handle)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_xfer_state is called unwrapped here, but get_finished wraps the same call in try/except nixlBackendError. Inconsistent — if it can raise in one place, it can raise in the other.

Comment on lines +263 to +271
while True:
state = self.agent.check_xfer_state(entry.xfer_handle)
if state == "DONE":
break
elif state == "PROC":
i += 1
if i % 10 == 0:
self.logger.debug("wait_job %d iterations=%d", job_id, i)
time.sleep(0.1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while True: time.sleep(0.1) with no upper bound — if NIXL's state machine wedges, this hangs forever. Consider a timeout: float | None argument so shutdown paths can bail out.

Comment on lines +228 to +229
results = self._pending_results
self._pending_results = []
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuple swap makes the intent obvious ("take what was there, replace with empty"):

results, self._pending_results = self._pending_results, []

flat_block_ids = [b for block_list in block_ids for b in block_list]
for (buf, _), block_id in zip(stagings, flat_block_ids):
self.tensors[0][block_id].copy_(buf[0], non_blocking=True)
self._h2d_stream.synchronize()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_complete_read blocks the polling loop. Line 110 does self._h2d_stream.synchronize() — called from get_finished()_complete_transfer()_complete_read(). If many blocks finish at once, every get_finished() call waits for all H2D copies on all just-completed jobs. Under load, this serializes completion. For an async engine, surprising.

Comment on lines +111 to +114
assert endpoint, "OBJ endpoint not set. Use --obj-endpoint or OBJ_ENDPOINT env var."
assert bucket, "OBJ bucket not set. Use --obj-bucket or OBJ_BUCKET env var."
assert access_key, "OBJ access_key not set. Use --obj-access-key or OBJ_ACCESS_KEY env var."
assert secret_key, "OBJ secret_key not set. Use --obj-secret-key or OBJ_SECRET_KEY env var."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make test runs pytest ./tests/ -v -sq across everything, so this test is picked up even when no S3 is available. The docstring says "Skips the entire session if required credentials are not provided," but assert raises AssertionError — pytest reports it as a failure, not a skip. Please use pytest.skip(...) instead:

if not endpoint:
    pytest.skip("OBJ endpoint not set — skipping S3 integration tests")
if not bucket:
    pytest.skip("OBJ bucket not set")
# ...

That way make test passes cleanly on dev machines without S3 and still exercises the path on CI where credentials are configured.

assert get_result.transfer_time is not None and get_result.transfer_time > 0
assert get_result.transfer_type == ("SHARED_STORAGE", "GPU")

assert_blocks_equal_cross(original, restored, read_block_ids)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Objects written to S3 are never cleaned up. Every test run leaves orphan objects under root_dir="kv-test", so the bucket grows without bound. Either add a fixture teardown that deletes objects by prefix, or use a unique per-run prefix (UUID/timestamp) so leftovers are at least separable.

"secret_key": secret_key,
"scheme": scheme,
"ca_bundle": ca_bundle,
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring says "Assumes the object store bucket already exists and is reachable," but nothing checks this. If the bucket doesn't exist the test fails cryptically deep in NIXL. A one-line head_bucket probe inside obj_config that calls pytest.skip(...) on failure would surface the problem clearly (and also covers the case where the endpoint is configured but unreachable).

parser.addoption("--obj-access-key", default=None)
parser.addoption("--obj-secret-key", default=None)
parser.addoption("--obj-scheme", default=None)
parser.addoption("--obj-ca_bundle", default=None)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming inconsistency: other options use hyphens (--obj-access-key, --obj-secret-key) but this one mixes a hyphen then an underscore (--obj-ca_bundle). Suggest --obj-ca-bundle for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants