-
Notifications
You must be signed in to change notification settings - Fork 35
Description
LRU Archive Cache Byte-Limit Eviction Is Broken (self.current Never Updated)
Summary
The LruArchiveBackend in terminus-store tracks its current byte usage in a current: usize field, but this field is initialized to 0 and never updated when entries are added or evicted. As a result, the byte-limit eviction logic in ensure_enough_cache_space always sees current == 0, believes the cache is empty, and never triggers byte-based eviction. The LRU cache grows without bound (up to entry count limits imposed by the lru crate's unbounded mode).
Affected Code
File: terminusdb-store/src/storage/archive.rs
Struct definition
#[derive(Clone)]
pub struct LruArchiveBackend<M, D> {
cache: Arc<tokio::sync::Mutex<LruCache<[u32; 5], CacheEntry>>>,
limit: usize, // configured limit in MB
current: usize, // intended to track current usage in bytes — never updated
metadata_origin: M,
data_origin: D,
}Initialization — current set to 0, never mutated again
impl<M, D> LruArchiveBackend<M, D> {
pub fn new(metadata_origin: M, data_origin: D, limit: usize) -> Self {
let cache = Arc::new(tokio::sync::Mutex::new(LruCache::unbounded()));
Self {
cache,
limit,
current: 0, // <-- initialized to 0
metadata_origin,
data_origin,
}
}
}The eviction check — always sees current == 0
In get_layer_bytes, when a new entry is resolved and about to be cached:
if ensure_enough_cache_space(
&mut *cache,
self.limit_bytes(),
self.current, // <-- always 0
bytes.len(),
) {
let cached = cache
.get_mut(&id)
.expect("layer resolving entry not found in cache");
*cached = CacheEntry::Resolved(bytes.clone());
// NOTE: self.current is NOT incremented here
}The eviction function — never evicts because remaining == limit
fn ensure_enough_cache_space(
cache: &mut LruCache<[u32; 5], CacheEntry>,
limit: usize,
current: usize, // always 0
required: usize,
) -> bool {
if required > limit {
return false; // only triggers for single entries > limit
}
let remaining = limit - current; // remaining == limit, always
if remaining < required { // never true unless required > limit
ensure_additional_cache_space(cache, required - remaining);
}
true
}Because current is always 0, remaining always equals limit, so remaining < required is never true (the single-entry required > limit guard above already returns false for that case). The eviction path ensure_additional_cache_space is dead code in practice.
Impact
- The LRU cache configured via
TERMINUSDB_LRU_CACHE_SIZE(default 512 MB) has no effective byte limit. It will cache every resolved layer archive indefinitely, bounded only by available memory. - In deployments with many distinct layers accessed over time, this can lead to unbounded memory growth that is invisible to operators (the configured limit suggests memory is bounded when it is not).
- The
ensure_additional_cache_spacefunction and its LRU eviction logic are effectively dead code and have never been exercised in production.
Root Cause
self.current is never mutated after initialization. It should be:
- Incremented by
bytes.len()after aCacheEntry::Resolved(bytes)is inserted into the cache. - Decremented by the byte size of each entry evicted by
ensure_additional_cache_spaceordrop_from_cache.
However, self.current is a plain usize on a Clone-derived struct, and get_layer_bytes takes &self (not &mut self). Updating it requires either:
- Changing
currentto anArc<AtomicUsize>(preferred — lock-free, no contention with the cache mutex). - Moving
currentinside theMutex-guarded cache struct.
Proposed Fix
Replace current: usize with current: Arc<std::sync::atomic::AtomicUsize> and update it at every cache mutation point:
-
After inserting a resolved entry in
get_layer_bytes:*cached = CacheEntry::Resolved(bytes.clone()); self.current.fetch_add(bytes.len(), Ordering::Relaxed);
-
Inside
ensure_additional_cache_space— the evicted entry sizes are already computed; pass theAtomicUsizein and decrement:if let CacheEntry::Resolved(entry) = entry { current.fetch_sub(entry.len(), Ordering::Relaxed); // ...existing required tracking... }
-
In
drop_from_cache— decrement when an uncacheable or failed entry is removed:fn drop_from_cache(cache: &mut LruCache<[u32; 5], CacheEntry>, id: [u32; 5], current: &AtomicUsize) { if let Some((_, CacheEntry::Resolved(bytes))) = cache.peek_lru().filter(|(k, _)| **k == id) { current.fetch_sub(bytes.len(), Ordering::Relaxed); } cache.demote(&id); cache.pop_lru(); }
-
Update
used_bytes()to read from the atomic instead of iterating the cache (faster, always available):pub fn used_bytes(&self) -> usize { self.current.load(Ordering::Relaxed) }
Testing
- Add a unit test that creates an
LruArchiveBackendwith a small limit (e.g., 1 MB), inserts entries exceeding that limit, and verifies that older entries are evicted. - Verify
used_bytes()returns accurate totals after insertions and evictions. - Verify that entries larger than
limit_bytes()are correctly rejected (not cached).
Labels
bug, terminus-store, memory