-
Notifications
You must be signed in to change notification settings - Fork 644
[RFC]: Hard Pin and Upsert for Mooncake Store #1645
Description
1. Summary
This RFC proposes two independent, composable features to enhance Mooncake Store for model weight storage workloads:
- Hard Pin — A mechanism that guarantees an object will never be evicted, until explicitly unpinned or removed. Hard pin is set at creation time via
ReplicateConfig. - Upsert — An update interface that replaces an existing object's data in-place (when size is unchanged) or via delete-and-reallocate (when size differs). Concurrent in-progress writes on the same key are preempted.
Both features are designed as orthogonal primitives. They can be used independently or together, preserving flexibility for reinforcement learning, model management, and hybrid KVCache + model weight workloads.
The rest of this document first explains the design of the two features, then summarizes implementation impact, client APIs, configuration, compatibility, and testing.
2. Design
2.1 Design Principles
- Minimal data structure changes — Reuse existing mechanisms (
replicas_,ReplicaStatus,processing_keys,DiscardedReplicas) wherever possible. - Predictable reader behavior — During in-place upsert (Case B), the key temporarily has no readable replicas (all replicas are PROCESSING). Readers receive
REPLICA_IS_NOT_READY, consistent with a key mid-PutStart. This is acceptable for model weight workloads where updates are coordinated and inference is paused during checkpoint loading. See §2.3.8 for a detailed discussion of this trade-off. - Memory efficiency — Avoid 2x memory overhead by reusing existing buffers (in-place) or freeing before reallocating, instead of COW.
- Failure safety — Crashed or failed operations must be automatically cleaned up by existing timeout mechanisms.
2.2 Hard Pin
2.2.1 Data Model
Add a hard_pinned boolean to ObjectMetadata:
// master_service.h — ObjectMetadata (alongside existing lease_timeout and soft_pin_timeout)
mutable bool hard_pinned GUARDED_BY(lock){false};Why a boolean instead of a timeout?
- Model weights need indefinite protection. Using a timeout to approximate "forever" is the exact anti-pattern we're replacing.
- A boolean is cheaper to check (no clock comparison) and semantically unambiguous.
2.2.2 Setting Hard Pin
Hard pin is set at creation time via ReplicateConfig:
// replica.h
struct ReplicateConfig {
size_t replica_num{1};
bool with_soft_pin{false};
bool with_hard_pin{false}; // NEW
// ...
};When with_hard_pin=true, the object is hard-pinned from creation. This is the common path for model weight storage. Hard pin state is preserved across Upsert operations.
To unpin a hard-pinned object, the user must Remove it and re-create it without with_hard_pin.
2.2.3 Eviction Behavior
The eviction loop in BatchEvict() currently has two passes:
| Pass | What it evicts |
|---|---|
| First | Non-soft-pinned objects with expired leases |
| Second | Soft-pinned objects (if allow_evict_soft_pinned_objects_ is true) |
With hard pin, the change is minimal — add a single check at the top of the iteration:
// master_service.cpp — BatchEvict(), in the per-object loop (3 locations)
if (it->second.IsHardPinned()) {
++it;
continue; // Skip — hard-pinned objects are NEVER evicted
}
// ... existing soft pin / lease checks unchangedThe eviction priority becomes:
Eviction Priority (first evicted → last evicted):
1. Non-pinned, lease-expired objects ← evicted first
2. Soft-pinned objects (if allowed) ← evicted under pressure
3. Hard-pinned objects ← NEVER evicted
2.2.4 HA mode Serialization
The MetadataSerializer must persist the hard_pinned field so that hard pin state survives master restarts. This is a single boolean field appended to the existing msgpack serialization.
2.3 Upsert (In-Place Update / Delete-and-Reallocate)
2.3.1 Core Insight
Instead of a COW approach that requires 2x memory (old + new data coexist), Upsert takes a simpler, more memory-efficient approach:
- When size is unchanged: reuse the existing memory buffers — mark replicas as PROCESSING, let the client overwrite the data in-place, then mark them COMPLETE again.
- When size differs: free the old replicas first, then allocate new ones at the required size.
- When the key has an in-progress write (Put or Upsert): preempt the in-progress operation — move its PROCESSING replicas to
DiscardedReplicasfor delayed release, then proceed with the new Upsert.
This design reuses existing mechanisms with minimal changes:
| Existing Mechanism | Role in Upsert |
|---|---|
ReplicaStatus state machine |
COMPLETE → PROCESSING → COMPLETE lifecycle for in-place updates |
processing_keys set |
Tracks keys with active Upsert/Put writes |
DiscardedReplicas + TTL |
Safe delayed release of preempted PROCESSING replicas |
DiscardExpiredProcessingReplicas |
Auto-cleanup of failed upsert's replicas on timeout |
Required metadata changes:
ObjectMetadata::put_start_timemust become non-const— refreshed tonowon in-place UpsertStart, soDiscardExpiredProcessingReplicastimes out the current write cycle, not the original creation time.ObjectMetadata::client_idmust become non-const— updated to the new client on in-place UpsertStart.ObjectMetadata::sizeremainsconst— when size changes, the metadata entry is deleted and recreated.Replicaneeds new methods:mark_processing()— transitions COMPLETE → PROCESSING for in-place updates.is_processing()/fn_is_processing()— predicate for filtering PROCESSING replicas during preemption.is_busy()/fn_is_busy()— checksrefcnt > 0, used as a safety gate before allowing Case B/C.
New error code:
OBJECT_REPLICA_BUSY(-714) — returned when UpsertStart finds replicas with non-zero refcnt (active RDMA reads in flight). The client should retry after readers finish.
No changes are needed to eviction logic or any other existing mechanism.
2.3.2 API Design
Master Service API
Following the existing PutStart/PutEnd/PutRevoke pattern:
// Single-key operations
auto UpsertStart(const UUID& client_id, const std::string& key,
const uint64_t slice_length, const ReplicateConfig& config)
-> tl::expected<std::vector<Replica::Descriptor>, ErrorCode>;
auto UpsertEnd(const UUID& client_id, const std::string& key,
ReplicaType replica_type)
-> tl::expected<void, ErrorCode>;
auto UpsertRevoke(const UUID& client_id, const std::string& key,
ReplicaType replica_type)
-> tl::expected<void, ErrorCode>;
// Batch variants
std::vector<tl::expected<std::vector<Replica::Descriptor>, ErrorCode>>
BatchUpsertStart(const UUID& client_id,
const std::vector<std::string>& keys,
const std::vector<uint64_t>& slice_lengths,
const ReplicateConfig& config);
std::vector<tl::expected<void, ErrorCode>>
BatchUpsertEnd(const UUID& client_id, const std::vector<std::string>& keys);
std::vector<tl::expected<void, ErrorCode>>
BatchUpsertRevoke(const UUID& client_id, const std::vector<std::string>& keys);UpsertEnd and UpsertRevoke delegate to PutEnd and PutRevoke — no special logic is needed. The batch variants follow the same delegation pattern (BatchUpsertEnd → BatchPutEnd, etc.).
Client Service API (C++)
// Single-key: orchestrates UpsertStart → TransferWrite → UpsertEnd/Revoke
tl::expected<void, ErrorCode> Upsert(const ObjectKey& key,
std::vector<Slice>& slices,
const ReplicateConfig& config);
// Batch: pipeline — BatchUpsertStart → parallel RDMA → BatchUpsertEnd/Revoke
std::vector<tl::expected<void, ErrorCode>> BatchUpsert(
const std::vector<ObjectKey>& keys,
std::vector<std::vector<Slice>>& batched_slices,
const ReplicateConfig& config);PyClient / RealClient / DummyClient API
The upsert API surface mirrors the existing put family. Each put variant has an upsert counterpart:
| Put variant | Upsert counterpart | Description |
|---|---|---|
put(key, span) |
upsert(key, span) |
Single key, copy semantics |
put_from(key, buffer, size) |
upsert_from(key, buffer, size) |
Single key, zero-copy from registered buffer |
put_parts(key, spans) |
upsert_parts(key, spans) |
Single key, multi-part concatenation |
put_batch(keys, spans) |
upsert_batch(keys, spans) |
Batch, copy semantics |
batch_put_from(keys, buffers, sizes) |
batch_upsert_from(keys, buffers, sizes) |
Batch, zero-copy |
All variants accept an optional ReplicateConfig parameter.
Python Binding API
Raw bytes — mirrors put / put_from / put_batch / batch_put_from / put_parts:
store.upsert(key, value, config=None)
store.upsert_from(key, buffer_ptr, size, config=None)
store.upsert_parts(key, *parts, config=None)
store.upsert_batch(keys, values, config=None)
store.batch_upsert_from(keys, buffer_ptrs, sizes, config=None)Tensor — mirrors put_tensor / put_tensor_from / batch_put_tensor / batch_put_tensor_from:
store.upsert_tensor(key, tensor)
store.upsert_tensor_from(key, buffer_ptr, size)
store.batch_upsert_tensor(keys, tensors_list)
store.batch_upsert_tensor_from(keys, buffer_ptrs, sizes)Tensor with configurable replication (upsert-only, no put counterpart):
store.upsert_pub_tensor(key, tensor, config)
store.batch_upsert_pub_tensor(keys, tensors_list, config)Note: Tensor Parallelism (TP) variants (
upsert_tensor_with_tp, etc.) are not included in V1. They can be added later following the same pattern asput_tensor_with_tp.
2.3.3 Client Call Path
The upsert call path mirrors Put, with 6 layers from Python down to the master service:
Python Binding (store_py.cpp)
│ upsert_tensor() / upsert_from() / batch_upsert_from() / ...
│ Tensor methods: extract metadata + data → call upsert_parts / upsert_from
│ Raw bytes methods: extract buffer info → call PyClient interface
▼
PyClient interface (pyclient.h) ── pure virtual base class
├─ RealClient (real_client.cpp) ── production path
│ upsert_internal() / upsert_from_internal() / ...
│ Allocates RDMA-registered buffer, copies data, splits into Slices
│ Calls Client::Upsert(key, slices, config)
│
└─ DummyClient (dummy_client.cpp) ── test/single-node path
IPC proxy → calls RealClient::upsert_dummy_helper() on server side
▼
Client Service (client_service.cpp)
│ Client::Upsert() ── orchestrates the three-phase protocol:
│ 1. MasterClient::UpsertStart()
│ 2. TransferWrite() ── RDMA write to allocated buffers
│ 3. MasterClient::UpsertEnd() or UpsertRevoke() on failure
│
│ Client::BatchUpsert() ── pipeline (see §2.3.4):
│ 1. StartBatchUpsert() ── one BatchUpsertStart RPC
│ 2. SubmitTransfers() ── async RDMA submit for all keys
│ 3. WaitForTransfers() ── wait for all RDMA to complete (parallel I/O)
│ 4. FinalizeBatchUpsert() ── BatchUpsertEnd / BatchUpsertRevoke RPCs
▼
Master Client (master_client.cpp) ── RPC client stub
│ Serializes parameters, sends RPC to master service
│ UpsertStart: sums slice_lengths into a single total_length before sending
▼
RPC Service (rpc_service.cpp) ── RPC server handler
│ Deserializes, delegates to MasterService, records metrics
▼
Master Service (master_service.cpp) ── core metadata logic
UpsertStart: three-way dispatch (Case A / B / C)
UpsertEnd: delegates to PutEnd
UpsertRevoke: delegates to PutRevoke
The middle layers (Client Service transfer phase, Master Client, RPC Service) are structurally identical to the Put path. The only new logic is in MasterService::UpsertStart (§2.3.5) and the BatchUpsert pipeline (§2.3.4).
2.3.4 BatchUpsert Pipeline
BatchUpsert uses the same pipeline architecture as BatchPut, reusing its transfer infrastructure. The pipeline replaces a serial loop (3N RPCs + N serial RDMA writes) with batched RPCs and parallel RDMA transfers:
Serial (old): Pipeline (new):
for each key: 1. BatchUpsertStart ── 1 RPC
UpsertStart ── 1 RPC 2. SubmitTransfers ── N async RDMA
TransferWrite ── 1 RDMA (blocking) 3. WaitForTransfers ── parallel wait
UpsertEnd ── 1 RPC 4. FinalizeBatchUpsert ── 1-2 RPCs
Total: 3N RPCs, N serial RDMA Total: 2-3 RPCs, N parallel RDMA
The implementation adds two thin wrapper methods (StartBatchUpsert, FinalizeBatchUpsert) that call the upsert-specific RPCs. The data transfer phase (SubmitTransfers, WaitForTransfers) and operation state tracking (PutOperation class) are reused from BatchPut without modification — they are operation-agnostic.
Note:
BatchUpsertdoes not supportprefer_alloc_in_same_node(returnsINVALID_PARAMS), so theBatchPutWhenPreferSameNodecode path is not needed.
2.3.5 Detailed Flow
flowchart TD
Start["UpsertStart(client_id, key, slice_length, config)"]
Start --> Validate{"Validate params:<br/>key non-empty, slice_length > 0,<br/>replica_num > 0, cachelib slice limit"}
Validate -- "invalid" --> ErrParams["INVALID_PARAMS"]
Validate -- "ok" --> Lock
Lock["Lock: snapshot_mutex_(shared) + shard(exclusive)<br/>it = shard->metadata.find(key)"]
Lock --> Found{"key found?"}
Found -- "no" --> CaseA
Found -- "yes" --> Stale{"CleanupStaleHandles:<br/>all handles stale?"}
Stale -- "yes" --> EraseStale["erase metadata & processing_keys"]
EraseStale --> CaseA
Stale -- "no" --> ChkRepl{"replication_tasks<br/>contains key?"}
ChkRepl -- "yes" --> ErrRepl["OBJECT_HAS_REPLICATION_TASK"]
ChkRepl -- "no" --> ChkOffload{"offloading_tasks<br/>contains key?"}
ChkOffload -- "yes" --> ErrRepl
ChkOffload -- "no" --> ChkProc{"key in<br/>processing_keys?"}
ChkProc -- "no" --> ChkRefcnt
ChkProc -- "yes" --> Preempt["Preempt ongoing Put/Upsert:<br/>PopReplicas(PROCESSING) → discarded<br/>(delayed free after timeout)<br/>erase from processing_keys"]
Preempt --> HasComplete{"COMPLETE replicas<br/>remain?"}
HasComplete -- "no" --> EraseMeta["erase metadata"]
EraseMeta --> CaseA
HasComplete -- "yes" --> ChkRefcnt
ChkRefcnt{"any replica<br/>refcnt > 0?"}
ChkRefcnt -- "yes" --> ErrBusy["OBJECT_REPLICA_BUSY"]
ChkRefcnt -- "no" --> SizeEq{"metadata.size<br/>== total_length?"}
SizeEq -- "equal" --> CaseB
SizeEq -- "different" --> CaseC
CaseA["Case A: key does not exist<br/>AllocateAndInsertMetadata()<br/>— allocate new buffers, create metadata"]
CaseB["Case B: same size — in-place update<br/>1. update client_id & put_start_time<br/>2. reconcile soft_pin<br/>3. mark_processing() all COMPLETE replicas<br/>4. return existing descriptors (same addrs)"]
CaseC["Case C: different size — reallocate<br/>1. PopReplicas() → discarded (delayed free)<br/>2. erase old metadata<br/>3. AllocateAndInsertMetadata() — new buffers"]
CaseA --> OK
CaseB --> OK
CaseC --> OK
OK["return vector of Replica::Descriptor<br/>→ client performs RDMA write"]
style ErrParams fill:#ddd,stroke:#888
style ErrRepl fill:#ddd,stroke:#888
style ErrBusy fill:#ddd,stroke:#888
UpsertStart(client_id, key, slice_length, config):
Step 0: Cleanup stale handles
IF key exists AND all handles are stale (CleanupStaleHandles returns true):
→ Erase processing_keys entry and metadata
→ Treat as non-existent (fall through to Case A)
Step 1: Safety checks and preemption (only if key exists)
1a. IF key has active replication_tasks (Copy/Move in progress):
→ Return OBJECT_HAS_REPLICATION_TASK
1b. IF key has active offloading_tasks:
→ Return OBJECT_HAS_REPLICATION_TASK
1c. IF key is in processing_keys (another Put/Upsert in progress):
→ Pop all PROCESSING replicas → DiscardedReplicas (TTL for delayed release)
→ Remove key from processing_keys
→ If no COMPLETE replicas remain, erase metadata entirely
and fall through to Case A
Step 2: Dispatch by key existence and size
Case A — Key does not exist (or erased by Step 0/1c):
→ Equivalent to PutStart (allocate replicas, create ObjectMetadata)
→ Return new replica descriptors
(Safety gate before Case B/C)
IF any replica has non-zero refcnt (is_busy):
→ Return OBJECT_REPLICA_BUSY
Case B — Key exists, size unchanged (in-place update):
→ Mark all COMPLETE replicas → PROCESSING (via mark_processing())
→ Refresh put_start_time = now
→ Update client_id = new client_id
→ Insert key into processing_keys
→ Return existing replica descriptors (same memory addresses)
Case C — Key exists, size changed (delete-and-reallocate):
→ Pop all replicas → DiscardedReplicas (TTL for delayed release,
same as MoveEnd — needed because GetReplicaList doesn't use refcnt,
so readers may still hold descriptors without incrementing refcnt)
→ Erase metadata entry
→ Allocate new replicas + create new ObjectMetadata (same as PutStart)
→ Return new replica descriptors
Client transfers data (identical to Put — TransferWrite via RDMA/TCP)
UpsertEnd(client_id, key, replica_type):
- Equivalent to
PutEnd— mark PROCESSING → COMPLETE, grant lease, remove from processing_keys.
UpsertRevoke(client_id, key, replica_type):
- Delegates to
PutRevoke— erases all PROCESSING replicas of the given type, cleans up metadata if empty. - Known limitation for Case B (in-place update): In Case B,
UpsertStartreuses existing buffers by marking COMPLETE replicas as PROCESSING. If the client's transfer fails and it callsUpsertRevoke,PutRevokeerases these replicas — the old data is lost. This is a deliberate trade-off: in-place update avoids 2x memory overhead but sacrifices rollback capability. For model weight workloads, this is acceptable because the source data (GPU/CPU memory) is still available and the client can retry the upsert.
2.3.6 Swimlane Diagrams
In-Place Update Path (key exists, size unchanged)
sequenceDiagram
participant C as Client
participant M as Master Service
participant TE as Transfer Engine
participant R as Server
C->>M: UpsertStart(client_id, key, same_size, config)
Note over M: Acquire shard write lock
M->>M: Find existing ObjectMetadata
M->>M: Safety checks (replication/offloading tasks, refcnt)
M->>M: size unchanged → in-place path
M->>M: mark_processing() on all COMPLETE replicas
M->>M: Refresh put_start_time, update client_id
M->>M: Insert key into processing_keys
Note over M: Release shard write lock
M-->>C: Return existing replica descriptors (same addresses)
C->>TE: TransferWrite(descriptors, new_data)
TE->>R: RDMA Write to same buffer
R-->>TE: Write complete
TE-->>C: Transfer success
C->>M: UpsertEnd(client_id, key, MEMORY)
Note over M: Same as PutEnd
M->>M: mark_complete() on PROCESSING replicas
M->>M: Remove from processing_keys
M->>M: GrantLease (preserve hard_pin/soft_pin)
M-->>C: Success
Delete-and-Reallocate Path (key exists, size changed)
sequenceDiagram
participant C as Client
participant M as Master Service
participant TE as Transfer Engine
participant R as Server
C->>M: UpsertStart(client_id, key, new_size, config)
Note over M: Acquire shard write lock
M->>M: Find existing ObjectMetadata
M->>M: size differs → delete-and-reallocate path
M->>M: Check refcnt → reject if busy (OBJECT_REPLICA_BUSY)
M->>M: Pop all replicas → DiscardedReplicas (TTL delayed release)
M->>M: Erase old metadata
M->>M: Allocate new replicas (same as PutStart)
M->>M: Create new ObjectMetadata
Note over M: Release shard write lock
M-->>C: Return new replica descriptors
C->>TE: TransferWrite(descriptors, data)
TE->>R: RDMA Write to new buffer
R-->>TE: Write complete
TE-->>C: Transfer success
C->>M: UpsertEnd(client_id, key, MEMORY)
M->>M: Same as PutEnd
M-->>C: Success
Preemption Path (key has in-progress write)
sequenceDiagram
participant A as Client A
participant B as Client B
participant M as Master Service
A->>M: PutStart(key) or UpsertStart(key)
M-->>A: Return descriptors
Note over A: Transfer in progress...
B->>M: UpsertStart(key, ...)
Note over M: key is in processing_keys
M->>M: Pop A's PROCESSING replicas → DiscardedReplicas (with TTL)
M->>M: Remove from processing_keys
M->>M: Continue with normal Upsert flow (Case A/B/C)
M-->>B: Return descriptors
Note over M: A's replicas released after TTL<br/>by ReleaseExpiredDiscardedReplicas()
A->>M: PutEnd(key) or UpsertEnd(key)
M-->>A: ILLEGAL_CLIENT or OBJECT_NOT_FOUND
2.3.7 Failure Handling
- Client crash after UpsertStart (Case A/C):
DiscardExpiredProcessingReplicaswill clean up afterput_start_discard_timeout_sec_, same as forPutStart. The newly allocated replicas are freed, and the metadata entry is removed. - Client crash after UpsertStart (Case B — in-place): All replicas are PROCESSING with no COMPLETE replicas remaining.
DiscardExpiredProcessingReplicasremoves the entire metadata entry — old data is lost. This is the same trade-off described in UpsertRevoke above: no rollback for in-place updates. - Transfer failure and UpsertRevoke (Case A/C):
PutRevokeerases the newly allocated PROCESSING replicas. For Case C, the old replicas were already moved toDiscardedReplicasand will be freed after TTL. The key effectively disappears until the client retries. - Transfer failure and UpsertRevoke (Case B — in-place):
PutRevokeerases the reused PROCESSING replicas — old data is lost. The client must retry from source. See "Known Limitations" below. - Allocation failure in delete-and-reallocate (Case C): Old replicas have been moved to
DiscardedReplicas, but new allocation fails. The key's data is lost. The client can retry. - Preempted client calls UpsertEnd/PutEnd: Returns
ILLEGAL_CLIENT(if metadata was reused with a new client_id) orOBJECT_NOT_FOUND(if metadata was erased and recreated). The preempted client should treat this as a no-op — the new client has taken over the key. - Concurrent Copy/Move/Offload: Returns
OBJECT_HAS_REPLICATION_TASK. The client should retry after the replication task completes. - Active readers (non-zero refcnt): Returns
OBJECT_REPLICA_BUSY. The client should retry after readers release their references. Note: this only blocks Case B/C; if the key is mid-write (in processing_keys), preemption proceeds regardless of refcnt on the PROCESSING replicas.
2.3.8 Read Visibility During Upsert
During an in-place update (Case B), all replicas are marked PROCESSING. GetReplicaList only returns COMPLETE replicas, so the key is temporarily unreadable — readers receive REPLICA_IS_NOT_READY until UpsertEnd marks them COMPLETE again.
This is a deliberate design choice to avoid read-write data races on RDMA buffers. The alternatives and their trade-offs:
| Approach | Read during write | Memory overhead | Complexity |
|---|---|---|---|
| Current (mark_processing) | Not readable | 1x | Low |
| COW (allocate new, swap on complete) | Reads see old data | 2x peak | Medium |
| Shadow key (write to temp key, rename) | Reads see old data | 2x peak | High (namespace pollution) |
For model weight workloads, the current approach is acceptable because:
- Weight updates are coordinated operations — inference is typically paused during checkpoint loading.
- The unreadable window is short (duration of RDMA transfer, typically milliseconds to seconds for GB-scale data).
- Avoiding 2x memory overhead is critical — model weights can consume the majority of available memory.
For workloads that require read availability during updates, COW semantics should be considered in a future RFC.
2.3.9 Known Limitations
| Limitation | Description | Impact |
|---|---|---|
| In-place revoke loses old data | Case B UpsertRevoke erases reused replicas. No rollback to previous value. | Client must retry from source (GPU/CPU). Acceptable for model weights. |
| Key unreadable during upsert | Case B marks all replicas PROCESSING; Get returns REPLICA_IS_NOT_READY. | Short window. Acceptable for coordinated weight updates. |
| replica_num changes ignored in Case B | If upsert requests a different replica_num than the existing object, Case B only checks size equality and reuses existing replicas. The new replica count is silently ignored. |
Client gets fewer (or more) replicas than requested. |
| Metrics reuse put counters | RPC layer increments put_start_requests / put_end_requests for upsert operations. |
Cannot distinguish put vs upsert request volume in monitoring. |
4. Client-Side API
The full API surface is defined in §2.3.2. Below are usage examples.
C++
// Single upsert (zero-copy from registered buffer)
ReplicateConfig config;
config.with_hard_pin = true;
client->upsert_from("model/policy/weights", buffer, size, config);
// Single upsert (copy semantics)
client->upsert("model/policy/weights", data_span, config);
// Batch upsert (zero-copy, pipeline — parallel RDMA)
auto results = client->batch_upsert_from(keys, buffers, sizes, config);Python
# Raw bytes
config = ReplicateConfig(with_hard_pin=True)
store.upsert("model/weights", weight_bytes, config)
store.upsert_from("model/weights", buffer_ptr, size, config)
store.batch_upsert_from(keys, buffer_ptrs, sizes, config)
# Tensor (default ReplicateConfig)
store.upsert_tensor("model/layer0/weights", tensor)
store.batch_upsert_tensor(keys, tensors_list)
# Tensor with custom replication
config = ReplicateConfig()
config.replica_num = 2
store.upsert_pub_tensor("model/layer0/weights", tensor, config)
# Zero-copy tensor (buffer layout: [TensorMetadata][tensor data])
store.upsert_tensor_from("model/layer0/weights", buffer_ptr, size)
store.batch_upsert_tensor_from(keys, buffer_ptrs, sizes)5. Configuration
No new configuration parameters are required. Hard pin is a per-object attribute set via ReplicateConfig.with_hard_pin at creation time. All existing configuration parameters retain their current behavior.
6. Migration and Backward Compatibility
- Wire format:
ReplicateConfiggains one new boolean field (with_hard_pin, defaultfalse). Old clients sending the old format will default tofalse— no hard pin, existing behavior preserved. - Snapshot format: A new
hard_pinnedfield is appended to the serializedObjectMetadata. On deserialization, if the field is absent (old snapshot), it defaults tofalse. - Existing APIs:
Put,Get,Remove,PutStart/PutEnd/PutRevokeare unchanged. No existing behavior is modified.
Before submitting a new issue...
- Make sure you already searched for relevant issues and read the documentation