Skip to content

Split robj into refcString and dbEntry #3494

@zuiderkwast

Description

@zuiderkwast

Background

The robj (struct serverObject, also called "reference-counted object") is
currently used for two distinct purposes:

  1. Reference-counted string — client arguments (c->argv), shared objects
    (shared.ok, shared.integers[...]), module strings (ValkeyModuleString),
    reply objects. These never have an embedded key or expire.

  2. Database entry — stored in the main keyspace hashtable. Has an embedded
    key, optionally an expire and an embedded value. The value can be any type
    (string, list, set, zset, hash, stream, module type).

Overloading one struct for both roles is not type-safe and requires runtime
checks (hasembkey, encoding checks) to determine which role an robj is
playing. This proposal splits robj into two distinct types.

Proposed types

refcString

A reference-counted string. Always holds an sds value. Never has an embedded
key or expire. Never uses OBJ_ENCODING_INT (the value is always a real sds).

typedef struct refcString {
    unsigned embedded : 1;   /* 1 = sds is embedded after the header */
    unsigned borrowed : 1;   /* 1 = ptr is owned by a dbEntry, don't free */
    unsigned refcount : 6;   /* max 63; sentinels: 63 = SHARED, 62 = STATIC */
    /* If embedded == 0: a pointer follows (with alignment padding) */
    /* If embedded == 1: sds data follows immediately at offset 1 */
} refcString;

The header is 1 byte. For embedded strings, the sds header and string data
follow immediately, making short strings very compact (e.g. "OK" = 1 + 3 + 3 =
7 bytes). For non-embedded strings, a pointer to an external sds follows (with
alignment padding, total 16 bytes on 64-bit).

6 bits for refcount (max real value 61, with 62 = STATIC and 63 = SHARED) is
probably sufficient. In practice, a refcString typically has at most ~5
simultaneous references (base + argv + replication + reply + module retain). An
assert in incrRefCount would guard against overflow. If 6 bits turns out to be
too few, the header can be extended to 2 bytes (giving up to 14 bits for
refcount) at the cost of slightly larger embedded strings.

Fields removed compared to robj:

  • type — always a string.
  • encoding — replaced by the single embedded bit.
  • lru — LRU/LFU tracking is only for database entries.
  • hasexpire, hasembkey, hasembval — not applicable.

Fields added:

  • borrowed — when set, the sds pointer is owned by a dbEntry (used for
    zero-copy SET, see below). The sds must not be freed when the refcString is
    freed.

Used for:

  • Client arguments (c->argv)
  • Shared objects (shared.ok, shared.integers[...], etc.)
  • Module strings (ValkeyModuleString)
  • Reply protocol strings
  • Command rewriting for replication

Notable simplification: since OBJ_ENCODING_INT is not used in refcString, all
code that consumes a refcString can assume the value is a valid sds. This
eliminates INT-encoding branches in addReply, getStringObjectLen,
compareStringObjects, getDecodedObject, feedReplicationBufferWithObject,
and others.

dbEntry

A database entry. Always has an embedded key. Optionally has an expire and/or
embedded value. Retains the current robj layout.

typedef struct dbEntry {
    unsigned type : 4;
    unsigned encoding : 4;
    unsigned lru : LRULFU_BITS;
    unsigned hasexpire : 1;
    unsigned hasembval : 1;
    unsigned refcount : OBJ_REFCOUNT_BITS;
    void *val_ptr;
    /* Embedded data follows: expire, key sds, optionally value */
} dbEntry;

hasembkey is removed since it is always 1 for a dbEntry.

Refcount is retained for:

  • Zero-copy reply I/O (the reply buffer holds a reference until the I/O thread
    writes the data).
  • Module event callbacks during dbSetValue — a module's key-unlink handler
    could call VM_StringDMA with write mode, which triggers
    dbUnshareStringValue and may replace the value being overwritten. The
    refcount prevents premature freeing during this window.

Note: MOVE and RENAME currently use refcount to protect the dbEntry during a
delete-then-reinsert sequence (bump to 2, delete, add, back to 1). This could
be replaced by a dbPop + dbAdd pattern that removes the entry from the
hashtable without freeing it, then reinserts it under the new key or in the new
database. This would eliminate the need for refcount in those paths.

OBJ_ENCODING_INT is allowed in dbEntry for memory-efficient storage of
integer string values.

Boundary between the two types

refcString → dbEntry (SET path)

When a value is inserted into the database, a dbEntry is created from the
refcString value. For large string values (where the sds is not embedded in the
refcString), the sds pointer is moved (stolen) from the refcString to the
dbEntry to avoid copying. The refcString in argv has its borrowed bit set,
indicating it now references the dbEntry's sds and must not free it.

The borrowed sds is safe because:

  • Command execution is synchronous.
  • Replication (feedReplicationBuffer) copies the sds content into the backlog
    via memcpy during propagateNow.
  • The refcString in argv is freed after propagation completes.
  • The dbEntry in the database outlives the command execution.

For small string values (embedded in the refcString), the data is copied into
the dbEntry's embedded region. No borrowing is needed.

dbEntry → reply (GET path)

For GET and similar read commands, the current zero-copy reply path stores a
reference to the dbEntry (via incrRefCount) in the reply buffer. The I/O
thread writes the sds directly to the socket and then calls decrRefCount.
This mechanism is unchanged since dbEntry retains refcounting.

addReplyBulk would need to accept a dbEntry (or a common internal helper
extracts the sds and length from either type).

lookupKey returns dbEntry *

lookupKey, lookupKeyRead, lookupKeyWrite and variants return dbEntry *
instead of robj *. Command implementations that read from the database receive
a dbEntry *.

Function changes

Many functions in object.c will need more than a type annotation change. Some
need redesigning (e.g. tryObjectEncoding — INT encoding doesn't apply to
refcString), some need type-specific variants (e.g. dupStringObject,
compareStringObjects, getLongLongFromObject — called on both types, but
handle OBJ_ENCODING_INT which only exists in dbEntry), and some become trivial
for refcString (e.g. sdsEncodedObject is always true, getDecodedObject is
the identity). A full audit of all functions and their callers is needed.

Functions that need type-specific implementations can use C11 _Generic macros
to dispatch based on pointer type, keeping call sites unchanged. For example:

void refcStringIncrRef(refcString *s);
void dbEntryIncrRef(dbEntry *e);

#define incrRefCount(o) _Generic((o), \
    refcString *: refcStringIncrRef, \
    dbEntry *: dbEntryIncrRef)(o)

This provides compile-time type checking without changing any call sites.

Shared objects

All shared objects (shared.*) become refcString. They are never stored in the
database (there is an existing assert for this). They are used as:

  • Reply protocol strings passed to addReply.
  • Synthetic argv entries for command rewriting/propagation.

Module API

ValkeyModuleString maps to refcString. Modules never see dbEntry directly:

  • Module type callbacks (unlink, free_effort, copy, rewrite) receive the
    key name as a refcString (from client argv), not a dbEntry.
  • VM_StringDMA operates on the dbEntry internally but returns a raw char *.
  • VM_StringPtrLen, VM_CreateString, etc. all operate on refcString.
  • VM_DefragValkeyModuleString operates on module-retained strings (refcString).

Known existing bug: the defrag path passes an sds cast to robj * as the key
parameter to module defrag callbacks. This should be fixed separately.

Hashtable API

The hashtable API (hashtableType callbacks, entryGetKey, etc.) is
unaffected. The kvstoreKeysHashtableType callbacks would operate on
dbEntry * instead of robj *.

Migration strategy

Phase 1: Type aliases (low risk)

Introduce refcString and dbEntry as typedefs for robj:

typedef robj refcString;
typedef robj dbEntry;

Migrate function signatures file by file to use the correct alias. This is a
mechanical change with no runtime effect. It produces a complete map of which
code touches which type.

Phase 2: Split the struct

Once all uses are classified, split struct serverObject into two distinct
structs. Fix compiler errors. The phase 1 classification makes this tractable
since every use site is already annotated with the intended type.

Since refcString has no encoding field, all OBJ_ENCODING_INT handling in
refcString code paths (e.g. addReply, getStringObjectLen,
feedReplicationBufferWithObject, getDecodedObject) must be resolved in this
phase — the compiler will enforce it.

Phase 3: Cleanup

Remove any remaining dead code and simplify patterns that were needed to handle
both types in a single struct (e.g. leftover type checks, hasembkey guards,
lru accesses in code that now only handles refcString).

File organization

The functions in object.c split naturally into refcString and dbEntry
categories:

  • refcString: createStringObject, createRawStringObject,
    createEmbeddedStringObject, createStringObjectFromLongLong,
    dupStringObject, makeObjectShared, incrRefCount, decrRefCount,
    freeStringObject, tryObjectEncoding, getDecodedObject,
    compareStringObjects, stringObjectLen, parsing helpers
    (getLongLongFromObject, getDoubleFromObject, etc.).

  • dbEntry: objectSetKeyAndExpire, objectGetKey, objectGetVal,
    objectSetVal, objectGetExpire, objectSetExpire, objectUnembedVal,
    initObjectLRUOrLFU, LRU/LFU accessors, createQuicklistObject,
    createSetObject, createHashObject, etc., freeListObject,
    freeSetObject, etc., dismissObject and variants, objectComputeSize.

Options for file organization:

  1. Split into two files: refcstring.c for refcString functions and
    dbentry.c (or keep the name object.c) for dbEntry functions. Clean
    separation but a larger diff.
  2. Keep object.c, extract refcString: Move refcString functions to a new
    refcstring.c, keep dbEntry functions in object.c. Smaller diff since
    object.c is the larger half.
  3. Keep everything in object.c: Just change the types in place. Smallest
    diff, easiest to review, but no file-level separation.

Scope

  • robj appears ~1050 times across 44 source files.
  • Phase 1 is safe and incremental (just type aliases).
  • Phase 2 is where the struct split and OBJ_ENCODING_INT removal happen;
    bugs could hide here since C doesn't prevent implicit casts between pointer
    types.
  • Phase 3 is cleanup of remaining dead code.

@JimB123

Metadata

Metadata

Assignees

No one assigned

    Labels

    de-crapifyCorrect crap decisions made in the past

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions