Skip to content

[NEW] Configuration to retain encodings during RDB reload #3479

@stockholmux

Description

@stockholmux

The problem/use-case that the feature addresses

If using RDB, consider this pseudo code:

> CONFIG SET hash-max-listpack-value 100
for i = 0 to 1000000
	> HSET key-i foo <value of 99 length>
end

> CONFIG SET hash-max-listpack-value 64
> HSET key-x foo <value of 99 length>

If you run OBJECT ENCODING on key-0 to key-1000000 you’ll get LISTPACK and key-x will return HASHTABLE. Let’s say that this takes you to a very high percent of your max RAM size, but not full. You can still accept reads/writes and all keys are available.

Now you stop Valkey and restart, triggering a load of the data from RDB. You’ll OOM or evict lots of data since HASHTABLE is much less memory efficient than LISTPACK . I’m not sure this would be truly expected behaviour: it fit before and now it doesn't!

Why does this happen?

In rdb.c there are conversions based on the current encoding config thresholds (example: https://github.com/valkey-io/valkey/blob/unstable/src/rdb.c#L2434). In other words when a RDB is reloaded, Valkey converts the encodings to whatever the current configuration is despite the encodings being stored in the RDB file. There is logic to this, but it isn't always desired.

Where would you encounter this?

This would primarily manifest in situations where you're altering encoding configurations live via CONFIG SET which is admittedly a little strange in many use cases (in contrast to config changes via valkey.conf). However, the in-development Kubernetes operator uses this method to change configurations, so we can expect CONFIG SET to become more mainstream. While the k8s operator could make these encoding settings immutable, that might not be desirable.

A different place you might encounter this is if you wanted to store a single key in a specific encoding you could, in theory, set the encoding configuration, write the key, then reset the configuration. Practically, you might want to do this to optimize specific a key’s memory usage.

Example:

> MULTI
> CONFIG SET hash-max-listpack-value 1
> HSET foo a 1234
> CONFIG SET hash-max-listpack-value 64
> EXEC

This would force the key foo into a hashtable encoding (or vice versa by swapping around the CONFIG SETs). However, this is useless because restoring from RDB will do conversion based the config setting. Admittedly, this specific method is impractical due to sensible ACL settings not allowing arbitrary CONFIG SET s.

Finally, in the future, I think it might be smart to have mechanisms to more directly force or coerce keys to specific encodings based on something other than the threshold configurations we have today.

In short, this encoding conversion may not be advantageous in a number of circumstances.

Description of the feature

I'd like a configuration that retains the encoding from the RDB instead of actively converting to the current encoding configuration. Alternately, you can think of this as ignoring the current encoding config during reload. The config could look something like this for the hash data type:

  • rdb-load-ignore-hash-max-listpack-value default no
  • rdb-load-ignore-hash-max-listpack-size default no

If yes, in rdb.c Valkey would check this the config to skip the conversion. If default ('no'), conversion would happen as previously indicated (no behaviour change).

This would give more control to the user over encodings with low performance impact (actually, I would expect RDB load to be faster with these settings set to yes in some circumstances). It would also avoid weird unexpected 'it fit before, now it doesn't after restart' situations.

Alternatives you've considered

  • Warning users about changing these configurations via log files. I'm not sure people read log warnings though.
  • Change encodings via background thread when CONFIG SET is run on these configs. This would prevent specific immediate OOM/mass eviction conditions on RDB load and would be observable. However, I would be worried about complexity and performance impact.

Additional information

In this issue I talk about hash types, but I would want configs for all type encodings, not just hash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions