Add LRU to improve the memory usage of the indexer#2050
Add LRU to improve the memory usage of the indexer#2050Lucccyo wants to merge 6 commits intoocaml:mainfrom
Conversation
|
(I did not review the tricky pointer arithmetic of the dllist yet) |
5bf10b0 to
4f75584
Compare
src/index-format/granular_marshal.ml
Outdated
| | In_memory v -> write_child lnk schema v size ~placeholders ~restore | ||
| | On_disk _ -> | ||
| write_child lnk schema (fetch lnk) size ~placeholders ~restore) | ||
| | In_memory v | In_memory_c (v, _) -> |
There was a problem hiding this comment.
I'm curious if your tests ran into the In_memory_c case here? The current code is correct, but if this does happen in practice then In_memory_c would benefit from being translated into a pointed-index On_disk_ptr directly (since an In_memory_c is an in-memory value that was read from an existing file)
There was a problem hiding this comment.
My test doesn't hit the In_memory_c case here.
src/index-format/granular_marshal.ml
Outdated
| in | ||
| List.iter | ||
| (fun (Cached (link, loc, store, schema)) -> | ||
| Cache.remove store.cache loc; |
There was a problem hiding this comment.
This remove doesn't look entirely correct, since we are loosing sharing as we never reintroduce lnk into the cache later if it turns out that offset was still useful (so we'll allocate new links for that same value). Does your benchmarks show that this remove is necessary? :)
There was a problem hiding this comment.
Yes I agree.
On the benchmarks, removing the Cache.remove multiplies the max memory usage by around 1.1.
The index links are stored in a Hashtable, which grows without bound as we add new entries. This is using a lot of memory.
This PR replaces the use of a Hashtable with an LRU cache to bound memory usage, keeping only the most recently used indices and evicting the least recently used entry when the capacity is reached.