[NEW] Configuration to retain encodings during RDB reload

**The problem/use-case that the feature addresses**

If using RDB, consider this pseudo code:

```
> CONFIG SET hash-max-listpack-value 100
for i = 0 to 1000000
	> HSET key-i foo <value of 99 length>
end

> CONFIG SET hash-max-listpack-value 64
> HSET key-x foo <value of 99 length>
```

If you run  `OBJECT ENCODING` on `key-0` to `key-1000000` you’ll get `LISTPACK` and `key-x`  will return `HASHTABLE`. Let’s say that this takes you to a very high percent of your max RAM size, but not full. You can still accept reads/writes and all keys are available.

Now you stop Valkey and restart, triggering a load of the data from RDB. You’ll OOM or evict lots of data since `HASHTABLE` is much less memory efficient than `LISTPACK` . I’m not sure this would be truly expected behaviour: it fit before and now it doesn't!

*Why does this happen?*

In `rdb.c` there are  conversions based on the current encoding config thresholds (example: https://github.com/valkey-io/valkey/blob/unstable/src/rdb.c#L2434). In other words when a RDB is reloaded, Valkey converts the encodings to whatever the _current_ configuration is despite the encodings being stored in the RDB file. There is logic to this, but it isn't always desired.

*Where would you encounter this?*

This would primarily manifest in situations where you're altering encoding configurations live via `CONFIG SET` which is admittedly a little strange in many use cases (in contrast to config changes via `valkey.conf`). However, the in-development Kubernetes operator uses this method to change configurations, so we can expect `CONFIG SET` to become more mainstream. While the k8s operator could make these encoding settings immutable, that might not be desirable.

A different place you might encounter this is if you wanted to store a single key in a specific encoding you could, in theory, set the encoding configuration, write the key, then reset the configuration. Practically, you might want to do this to optimize specific a key’s memory usage.

Example:

```jsx
> MULTI
> CONFIG SET hash-max-listpack-value 1
> HSET foo a 1234
> CONFIG SET hash-max-listpack-value 64
> EXEC
```

This would *force* the key `foo` into a `hashtable` encoding (or vice versa by swapping around the `CONFIG SET`s). However, this is useless because restoring from RDB will do conversion based the config setting. Admittedly, this specific method is impractical due to sensible ACL settings not allowing arbitrary `CONFIG SET` s. 

Finally, in the future, I think it might be smart to have mechanisms to more directly force or coerce keys to specific encodings based on something other than the threshold configurations we have today. 

In short, this encoding conversion may not be advantageous in a number of circumstances.

**Description of the feature**

I'd like a configuration that retains the encoding from the RDB instead of actively converting to the current encoding configuration. Alternately, you can think of this as ignoring the current encoding config during reload. The config could look something like this for the hash data type:

- `rdb-load-ignore-hash-max-listpack-value` default `no`
- `rdb-load-ignore-hash-max-listpack-size` default `no`

If `yes`, in `rdb.c` Valkey would check this the config to skip the conversion. If default ('no'), conversion would happen as previously indicated (no behaviour change).

This would give more control to the user over encodings with low performance impact (actually, I would expect RDB load to be _faster_ with these settings set to `yes` in some circumstances). It would also avoid weird unexpected 'it fit before, now it doesn't after restart' situations.


**Alternatives you've considered**

- Warning users about changing these configurations via log files. I'm not sure people read log warnings though.
- Change encodings via background thread when `CONFIG SET` is run on these configs. This would prevent specific immediate OOM/mass eviction conditions on RDB load and would be observable. However, I would be worried about complexity and performance impact.

**Additional information**

In this issue I talk about hash types, but I would want configs for all type encodings, not just hash.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEW] Configuration to retain encodings during RDB reload #3479

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[NEW] Configuration to retain encodings during RDB reload #3479

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions