Split robj into refcString and dbEntry

## Background

The `robj` (`struct serverObject`, also called "reference-counted object") is
currently used for two distinct purposes:

1. **Reference-counted string** — client arguments (`c->argv`), shared objects
   (`shared.ok`, `shared.integers[...]`), module strings (`ValkeyModuleString`),
   reply objects. These never have an embedded key or expire.

2. **Database entry** — stored in the main keyspace hashtable. Has an embedded
   key, optionally an expire and an embedded value. The value can be any type
   (string, list, set, zset, hash, stream, module type).

Overloading one struct for both roles is not type-safe and requires runtime
checks (`hasembkey`, encoding checks) to determine which role an robj is
playing. This proposal splits `robj` into two distinct types.

## Proposed types

### refcString

A reference-counted string. Always holds an sds value. Never has an embedded
key or expire. Never uses `OBJ_ENCODING_INT` (the value is always a real sds).

```c
typedef struct refcString {
    unsigned embedded : 1;   /* 1 = sds is embedded after the header */
    unsigned borrowed : 1;   /* 1 = ptr is owned by a dbEntry, don't free */
    unsigned refcount : 6;   /* max 63; sentinels: 63 = SHARED, 62 = STATIC */
    /* If embedded == 0: a pointer follows (with alignment padding) */
    /* If embedded == 1: sds data follows immediately at offset 1 */
} refcString;
```

The header is 1 byte. For embedded strings, the sds header and string data
follow immediately, making short strings very compact (e.g. "OK" = 1 + 3 + 3 =
7 bytes). For non-embedded strings, a pointer to an external sds follows (with
alignment padding, total 16 bytes on 64-bit).

6 bits for refcount (max real value 61, with 62 = STATIC and 63 = SHARED) is
probably sufficient. In practice, a refcString typically has at most ~5
simultaneous references (base + argv + replication + reply + module retain). An
assert in incrRefCount would guard against overflow. If 6 bits turns out to be
too few, the header can be extended to 2 bytes (giving up to 14 bits for
refcount) at the cost of slightly larger embedded strings.

Fields removed compared to robj:
- `type` — always a string.
- `encoding` — replaced by the single `embedded` bit.
- `lru` — LRU/LFU tracking is only for database entries.
- `hasexpire`, `hasembkey`, `hasembval` — not applicable.

Fields added:
- `borrowed` — when set, the sds pointer is owned by a dbEntry (used for
  zero-copy SET, see below). The sds must not be freed when the refcString is
  freed.

Used for:
- Client arguments (`c->argv`)
- Shared objects (`shared.ok`, `shared.integers[...]`, etc.)
- Module strings (`ValkeyModuleString`)
- Reply protocol strings
- Command rewriting for replication

Notable simplification: since `OBJ_ENCODING_INT` is not used in refcString, all
code that consumes a refcString can assume the value is a valid sds. This
eliminates INT-encoding branches in `addReply`, `getStringObjectLen`,
`compareStringObjects`, `getDecodedObject`, `feedReplicationBufferWithObject`,
and others.

### dbEntry

A database entry. Always has an embedded key. Optionally has an expire and/or
embedded value. Retains the current robj layout.

```c
typedef struct dbEntry {
    unsigned type : 4;
    unsigned encoding : 4;
    unsigned lru : LRULFU_BITS;
    unsigned hasexpire : 1;
    unsigned hasembval : 1;
    unsigned refcount : OBJ_REFCOUNT_BITS;
    void *val_ptr;
    /* Embedded data follows: expire, key sds, optionally value */
} dbEntry;
```

`hasembkey` is removed since it is always 1 for a dbEntry.

Refcount is retained for:
- Zero-copy reply I/O (the reply buffer holds a reference until the I/O thread
  writes the data).
- Module event callbacks during `dbSetValue` — a module's key-unlink handler
  could call `VM_StringDMA` with write mode, which triggers
  `dbUnshareStringValue` and may replace the value being overwritten. The
  refcount prevents premature freeing during this window.

Note: MOVE and RENAME currently use refcount to protect the dbEntry during a
delete-then-reinsert sequence (bump to 2, delete, add, back to 1). This could
be replaced by a `dbPop` + `dbAdd` pattern that removes the entry from the
hashtable without freeing it, then reinserts it under the new key or in the new
database. This would eliminate the need for refcount in those paths.

`OBJ_ENCODING_INT` is allowed in dbEntry for memory-efficient storage of
integer string values.

## Boundary between the two types

### refcString → dbEntry (SET path)

When a value is inserted into the database, a dbEntry is created from the
refcString value. For large string values (where the sds is not embedded in the
refcString), the sds pointer is moved (stolen) from the refcString to the
dbEntry to avoid copying. The refcString in argv has its `borrowed` bit set,
indicating it now references the dbEntry's sds and must not free it.

The borrowed sds is safe because:
- Command execution is synchronous.
- Replication (`feedReplicationBuffer`) copies the sds content into the backlog
  via memcpy during `propagateNow`.
- The refcString in argv is freed after propagation completes.
- The dbEntry in the database outlives the command execution.

For small string values (embedded in the refcString), the data is copied into
the dbEntry's embedded region. No borrowing is needed.

### dbEntry → reply (GET path)

For GET and similar read commands, the current zero-copy reply path stores a
reference to the dbEntry (via `incrRefCount`) in the reply buffer. The I/O
thread writes the sds directly to the socket and then calls `decrRefCount`.
This mechanism is unchanged since dbEntry retains refcounting.

`addReplyBulk` would need to accept a dbEntry (or a common internal helper
extracts the sds and length from either type).

### lookupKey returns dbEntry *

`lookupKey`, `lookupKeyRead`, `lookupKeyWrite` and variants return `dbEntry *`
instead of `robj *`. Command implementations that read from the database receive
a `dbEntry *`.

## Function changes

Many functions in `object.c` will need more than a type annotation change. Some
need redesigning (e.g. `tryObjectEncoding` — INT encoding doesn't apply to
refcString), some need type-specific variants (e.g. `dupStringObject`,
`compareStringObjects`, `getLongLongFromObject` — called on both types, but
handle `OBJ_ENCODING_INT` which only exists in dbEntry), and some become trivial
for refcString (e.g. `sdsEncodedObject` is always true, `getDecodedObject` is
the identity). A full audit of all functions and their callers is needed.

Functions that need type-specific implementations can use C11 `_Generic` macros
to dispatch based on pointer type, keeping call sites unchanged. For example:

```c
void refcStringIncrRef(refcString *s);
void dbEntryIncrRef(dbEntry *e);

#define incrRefCount(o) _Generic((o), \
    refcString *: refcStringIncrRef, \
    dbEntry *: dbEntryIncrRef)(o)
```

This provides compile-time type checking without changing any call sites.

## Shared objects

All shared objects (`shared.*`) become refcString. They are never stored in the
database (there is an existing assert for this). They are used as:
- Reply protocol strings passed to `addReply`.
- Synthetic argv entries for command rewriting/propagation.

## Module API

`ValkeyModuleString` maps to refcString. Modules never see dbEntry directly:

- Module type callbacks (`unlink`, `free_effort`, `copy`, `rewrite`) receive the
  key name as a refcString (from client argv), not a dbEntry.
- `VM_StringDMA` operates on the dbEntry internally but returns a raw `char *`.
- `VM_StringPtrLen`, `VM_CreateString`, etc. all operate on refcString.
- `VM_DefragValkeyModuleString` operates on module-retained strings (refcString).

Known existing bug: the defrag path passes an sds cast to `robj *` as the key
parameter to module defrag callbacks. This should be fixed separately.

## Hashtable API

The hashtable API (`hashtableType` callbacks, `entryGetKey`, etc.) is
unaffected. The `kvstoreKeysHashtableType` callbacks would operate on
`dbEntry *` instead of `robj *`.

## Migration strategy

### Phase 1: Type aliases (low risk)

Introduce refcString and dbEntry as typedefs for robj:

```c
typedef robj refcString;
typedef robj dbEntry;
```

Migrate function signatures file by file to use the correct alias. This is a
mechanical change with no runtime effect. It produces a complete map of which
code touches which type.

### Phase 2: Split the struct

Once all uses are classified, split `struct serverObject` into two distinct
structs. Fix compiler errors. The phase 1 classification makes this tractable
since every use site is already annotated with the intended type.

Since refcString has no `encoding` field, all `OBJ_ENCODING_INT` handling in
refcString code paths (e.g. `addReply`, `getStringObjectLen`,
`feedReplicationBufferWithObject`, `getDecodedObject`) must be resolved in this
phase — the compiler will enforce it.

### Phase 3: Cleanup

Remove any remaining dead code and simplify patterns that were needed to handle
both types in a single struct (e.g. leftover `type` checks, `hasembkey` guards,
`lru` accesses in code that now only handles refcString).

## File organization

The functions in `object.c` split naturally into refcString and dbEntry
categories:

- **refcString:** `createStringObject`, `createRawStringObject`,
  `createEmbeddedStringObject`, `createStringObjectFromLongLong`,
  `dupStringObject`, `makeObjectShared`, `incrRefCount`, `decrRefCount`,
  `freeStringObject`, `tryObjectEncoding`, `getDecodedObject`,
  `compareStringObjects`, `stringObjectLen`, parsing helpers
  (`getLongLongFromObject`, `getDoubleFromObject`, etc.).

- **dbEntry:** `objectSetKeyAndExpire`, `objectGetKey`, `objectGetVal`,
  `objectSetVal`, `objectGetExpire`, `objectSetExpire`, `objectUnembedVal`,
  `initObjectLRUOrLFU`, LRU/LFU accessors, `createQuicklistObject`,
  `createSetObject`, `createHashObject`, etc., `freeListObject`,
  `freeSetObject`, etc., `dismissObject` and variants, `objectComputeSize`.

Options for file organization:
1. **Split into two files:** `refcstring.c` for refcString functions and
   `dbentry.c` (or keep the name `object.c`) for dbEntry functions. Clean
   separation but a larger diff.
2. **Keep `object.c`, extract refcString:** Move refcString functions to a new
   `refcstring.c`, keep dbEntry functions in `object.c`. Smaller diff since
   `object.c` is the larger half.
3. **Keep everything in `object.c`:** Just change the types in place. Smallest
   diff, easiest to review, but no file-level separation.

## Scope

- `robj` appears ~1050 times across 44 source files.
- Phase 1 is safe and incremental (just type aliases).
- Phase 2 is where the struct split and `OBJ_ENCODING_INT` removal happen;
  bugs could hide here since C doesn't prevent implicit casts between pointer
  types.
- Phase 3 is cleanup of remaining dead code.

@JimB123

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split robj into refcString and dbEntry #3494

Background

Proposed types

refcString

dbEntry

Boundary between the two types

refcString → dbEntry (SET path)

dbEntry → reply (GET path)

lookupKey returns dbEntry *

Function changes

Shared objects

Module API

Hashtable API

Migration strategy

Phase 1: Type aliases (low risk)

Phase 2: Split the struct

Phase 3: Cleanup

File organization

Scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Split robj into refcString and dbEntry #3494

Description

Background

Proposed types

refcString

dbEntry

Boundary between the two types

refcString → dbEntry (SET path)

dbEntry → reply (GET path)

lookupKey returns dbEntry *

Function changes

Shared objects

Module API

Hashtable API

Migration strategy

Phase 1: Type aliases (low risk)

Phase 2: Split the struct

Phase 3: Cleanup

File organization

Scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions