-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[diffusion] feat: support LoRA #13859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @mickqian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the diffusion model inference capabilities by integrating LoRA support. It provides a robust mechanism for managing LoRA adapters and their weights through new API endpoints and a refactored scheduler, allowing for more flexible and efficient model customization during generation. The changes focus on enabling dynamic LoRA application without impacting the core model architecture. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for LoRA (Low-Rank Adaptation) for inference, which is a significant feature enhancement. The changes are well-distributed across different components, including the main generator entrypoint, API endpoints, model layers, and the scheduler. A key part of this change is the removal of training-related LoRA code to streamline the implementation for inference. My review focuses on improving code maintainability by reducing duplication in the newly added LoRA control methods and removing some leftover debugging code. Overall, this is a solid contribution towards more flexible model adaptation at runtime.
python/sglang/multimodal_gen/runtime/entrypoints/diffusion_generator.py
Outdated
Show resolved
Hide resolved
python/sglang/multimodal_gen/runtime/entrypoints/openai/image_api.py
Outdated
Show resolved
Hide resolved
| method = reqs.get("method") | ||
| if method == "set_lora_adapter": | ||
| self.worker.set_lora_adapter( | ||
| reqs["lora_nickname"], | ||
| reqs["lora_path"], | ||
| ) | ||
| output_batch = {"status": "ok"} | ||
| elif method == "unmerge_lora_weights": | ||
| self.worker.unmerge_lora_weights() | ||
| output_batch = {"status": "ok"} | ||
| elif method == "merge_lora_weights": | ||
| self.worker.merge_lora_weights() | ||
| output_batch = {"status": "ok"} | ||
| else: | ||
| error_msg = f"Unknown method: {method}" | ||
| logger.error(error_msg) | ||
| output_batch = {"status": "error", "message": error_msg} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if/elif/else chain for handling different control methods is functional but can become cumbersome to maintain as more methods are added. Using a dispatch dictionary (mapping method names to handler functions) would make this code more scalable, readable, and align with the open/closed principle.
| method = reqs.get("method") | |
| if method == "set_lora_adapter": | |
| self.worker.set_lora_adapter( | |
| reqs["lora_nickname"], | |
| reqs["lora_path"], | |
| ) | |
| output_batch = {"status": "ok"} | |
| elif method == "unmerge_lora_weights": | |
| self.worker.unmerge_lora_weights() | |
| output_batch = {"status": "ok"} | |
| elif method == "merge_lora_weights": | |
| self.worker.merge_lora_weights() | |
| output_batch = {"status": "ok"} | |
| else: | |
| error_msg = f"Unknown method: {method}" | |
| logger.error(error_msg) | |
| output_batch = {"status": "error", "message": error_msg} | |
| method = reqs.get("method") | |
| handlers = { | |
| "set_lora_adapter": lambda: self.worker.set_lora_adapter( | |
| reqs["lora_nickname"], reqs["lora_path"] | |
| ), | |
| "unmerge_lora_weights": self.worker.unmerge_lora_weights, | |
| "merge_lora_weights": self.worker.merge_lora_weights, | |
| } | |
| handler = handlers.get(method) | |
| if handler: | |
| handler() | |
| output_batch = {"status": "ok"} | |
| else: | |
| error_msg = f"Unknown method: {method}" | |
| logger.error(error_msg) | |
| output_batch = {"status": "error", "message": error_msg} |
python/sglang/multimodal_gen/runtime/utils/hf_diffusers_utils.py
Outdated
Show resolved
Hide resolved
0ee2d67 to
09d444c
Compare
09d444c to
b7bf7b6
Compare
|
/tag-and-rerun-ci |
this pr add support for LoRA with Wan-series, in:
for Qwen-Image and Flux, lora could be merged, but the accuracy needs further checking
TODO
TestLoRAWorkflowMotivation
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist