Skip to content

SenseiDeElite/neopreload

Repository files navigation

🦀 neopreload

License: GPL v2

A Rust fork of preload, the original C daemon by Behdad Esfahbod, substantially modernised beyond a faithful port.


🤔 How it works

Each cycle, neopreload scans /proc for running processes and the files they have mapped. It maintains:

  • an exe model — each tracked executable accumulates a cumulative runtime and a set of memory-mapped file regions;
  • a Markov model — a 4-state continuous-time Markov chain per exe pair, tracking co-occurrence and transition probability between running states;
  • a map arena — reference-counted file regions (path + offset + length) shared across exes.

At prediction time, Markov edges from currently-running exes cast probability bids onto not-yet-running exes. Those exes then bid their mapped regions into a readahead queue, sorted by score. The queue is consumed up to a memory budget derived from MemAvailable and total RAM, scaled by a swap pressure factor.

Readahead is submitted via posix_fadvise(WILLNEED). On rotational storage, regions are sorted by physical offset using FIEMAP before submission to reduce seek time. On SSDs and NVMe, sorting is skipped. Storage type is detected automatically from /sys/block at startup.

File deletions and replacements are tracked via inotify and purged from the model immediately.


🆕 Changes from the original preload

0️⃣1️⃣ Algorithm improvements

Feature Detail
MemAvailable budget Uses kernel's MemAvailable instead of MemFree + Cached to avoid double-counting reclaimable memory
Swap-factor scaling Budget multiplied by sqrt(SwapFree / SwapTotal); eviction kicks in when swap pressure is high
Writeback guard Readahead skipped entirely when dirty writeback exceeds 2% of RAM
POSIX_FADV_DONTNEED eviction Unpredicted maps are actively evicted under memory pressure
FIEMAP physical block sort On rotational storage, requests are sorted by physical disk block to minimise seek distance; skipped on SSD/NVMe
mincore residency check Checks whether all pages are already resident before issuing fadvise(WILLNEED); skips the syscall if so
Hit-rate logging Each cycle logs regions already resident vs submitted — a direct measure of prediction quality
IOPRIO_CLASS_IDLE Sets I/O scheduling class to IDLE at startup so prefetch I/O never competes with foreground work
Exponential weight decay Markov transition weights decay by 0.999 on each state change (~4 hour half-life at default 20 s cycle)
Running-exe exclusion Running-exe maps are excluded from the prefetch plan; their pages are already loaded
Base probability floor Exes with no Markov edges are scored by historical runtime share so isolated programs remain prefetch candidates
Sparse-correlation fix correlation() = 0.0 (insufficient data) is replaced with f64::MIN_POSITIVE to avoid zeroing newly-observed exe scores
Active-set Markov window Edges are only created to exes seen within the last 6 hours, bounding graph growth to O(N×K)
Differential /proc scan get_maps() is called only for new or exec'd PIDs; unchanged long-lived processes are skipped
Startup stale-exe purge On first run and hourly thereafter, tracked exes whose paths no longer exist are removed with full Markov/map cascade
Stale-map purge on load After loading the state file, maps rejected by the current mapprefix policy are immediately discarded
inotify deletion tracking Watches parent directories of tracked files; removes exes from the model when deleted or moved

🔨 Building

cargo build --release

⬇️ Installation

🐧 Arch Linux

A PKGBUILD is available.

⚙️ Configuration (optional)

See neopreload.toml.example.


💾 Memory budget

Each cycle, the prefetch budget is:

budget = max(0, total × memtotal% + available × memfree%) × swap_factor

where swap_factor = sqrt(swap_free / swap_total), or 1.0 if no swap is present. Readahead is skipped entirely when writeback exceeds 2% of total RAM.


🧩 Architecture

main.rs        Entry point, async main loop, signal handling
cmdline.rs     CLI argument parsing (clap)
conf.rs        Configuration loading (config + serde)
log_setup.rs   Logging initialisation (file logger + env_logger for stderr)
state.rs       Persistent + runtime data structures (arena graph, FxHashMap)
state_io.rs    State file load/save (tab-delimited)
proc_scan.rs   /proc scanning: processes, maps, memory stats
spy.rs         Data acquisition: scan() + update_model(), differential scanning
prophet.rs     Markov prediction engine, budget formula, readahead dispatch
fadvise.rs     posix_fadvise, FIEMAP, mincore, ioprio_set, eviction
watcher.rs     inotify file-deletion tracker

About

An adaptive readahead daemon.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors