-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is your feature request related to a problem? Please describe.
Today, an operator can set index.codec setting which applies the required compression technique based on the value set by operator (e.g. zstd, deflate, lz4) . Whenever a segment is written, the configured compression algorithm is applied by Lucene for the stored fields.
For use cases where customer is looking for updates on recently ingested documents getting recent documents, it may be better to not compress as we need to retrieve the stored _source field which needs to be decompressed and may require additional CPU usage.
Once the segments are merged into larger segments, and the documents may not be frequently accessed, we can compress to take advantage of the required storage size and reduce write amplification due to lesser size.
The lever can be based on different parameters such as FieldInfo, Segment Size, etc. and can be incorporated into merge policy or Per Field codec configuration, and not just based on temporal locality.
Describe the solution you'd like
This is a rough idea I'm looking for feedback on if it may be a good idea to explore.
Additional context
I was analyzing the performance of an update benchmark workload where I witnessed recently ingested documents were being retrieved for updates and spending CPU cycles in decompression.
