Implement LRU eviction policy for LoRA adapters#11041
Implement LRU eviction policy for LoRA adapters#11041Fridge003 merged 24 commits intosgl-project:mainfrom
Conversation
Summary of ChangesHello @ConnorLi96, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the memory management for LoRA adapters within SGLang by introducing a configurable Least Recently Used (LRU) eviction policy. This new policy aims to optimize cache efficiency by prioritizing frequently accessed adapters, keeping them in memory longer than less used ones. The changes involve a new modular framework for eviction policies, integration into the LoRA memory pool, and a command-line option for users to select their preferred policy, all while ensuring backward compatibility with the existing FIFO behavior. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable LRU eviction policy for LoRA adapters, which is a great enhancement for managing memory more intelligently. The implementation is well-structured, introducing a new eviction policy framework and integrating it cleanly into the existing LoRAMemoryPool and LoRAManager. The changes maintain backward compatibility by defaulting to the existing FIFO policy. My review includes a minor suggestion to improve code conciseness in the eviction logic.
python/sglang/srt/lora/mem_pool.py
Outdated
| candidates = set() | ||
| pinned_uids = set() | ||
|
|
||
| for buffer_id in range(self.max_loras_per_batch): | ||
| uid = self.buffer_id_to_uid[buffer_id] | ||
| if uid not in cur_uids and uid is not None: | ||
| candidates.add(uid) | ||
| lora_ref = lora_refs.get(uid) | ||
| if lora_ref is not None and lora_ref.pinned: | ||
| pinned_uids.add(uid) |
There was a problem hiding this comment.
The logic for collecting eviction candidates can be made more concise and readable. Using a comprehension to build a list of candidate info first, then creating the candidates and pinned_uids sets from it, can make the code more declarative and easier to follow.
all_candidates = [
(uid, lora_refs.get(uid))
for uid in self.buffer_id_to_uid
if uid not in cur_uids and uid is not None
]
candidates = {uid for uid, _ in all_candidates}
pinned_uids = {uid for uid, ref in all_candidates if ref and ref.pinned}90b9bf5 to
af09896
Compare
af09896 to
8e7afe1
Compare
nice, thank you so much for the guidance all the way! Can we add run-ci label for this PR? or we can just merge it directly. |
Motivation
For addressing this feature: [Feature] (2/2) Support LRU cache for LoRA eviction
This PR implements a configurable LRU (Least Recently Used) eviction policy for LoRA adapters to provide more intelligent memory management. Currently, SGLang only supports FIFO eviction, which may not be optimal for workloads where certain LoRA adapters are accessed more frequently than others. The LRU policy ensures that frequently used adapters remain in memory while less recently used ones are evicted first, potentially improving cache hit rates and overall performance.
Modifications
eviction_policy.pymodule with abstractEvictionPolicyclassLRUEvictionPolicyusing OrderedDict for O(1) access trackingFIFOEvictionPolicyfor backward compatibility--lora-eviction-policyargument toServerArgswith choices ["fifo", "lru"]LoRAMemoryPoolto use configurable eviction policiesLoRAManagerto pass eviction policy to memory poolSRTRunnerto accept eviction policy parameterAll changes maintain full backward compatibility with default FIFO behavior.
Accuracy Tests
This PR does not affect model outputs or inference accuracy.
Benchmarking and Profiling
The LRU eviction policy is designed to improve cache efficiency for workloads with non-uniform adapter access patterns. Performance impact is minimal:
Detailed benchmarking will be conducted with realistic workloads in future testing.
Checklist