[AMD] Enable all diffusion models and fix encoder loading on MI325#13760
[AMD] Enable all diffusion models and fix encoder loading on MI325#13760mickqian merged 27 commits intosgl-project:mainfrom
Conversation
Summary of ChangesHello @zyzshishui, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands Sglang Diffusion's compatibility by integrating full support for AMD's ROCm platform. The changes enable efficient execution of diffusion models on AMD GPUs, primarily by adopting AITer as the default attention backend and implementing robust data type handling. It also streamlines the development environment setup with a new ROCm-specific Dockerfile and improves stability by localizing external dependencies and preventing profiler data conflicts. The overall impact is a more versatile and performant Sglang Diffusion for a broader range of hardware. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces ROCm support for SGLang Diffusion, a significant step towards broader hardware compatibility. The changes include a new Dockerfile for ROCm, the integration of the AITer attention backend, and various code modifications to ensure compatibility and remove problematic dependencies on ROCm. My review has identified a couple of critical issues—one in the Dockerfile that would cause build failures and another in the AITer backend implementation that could lead to runtime errors. I have also provided suggestions to improve Dockerfile efficiency and documentation clarity. Overall, this is a valuable contribution that enables SGLang Diffusion on AMD hardware.
python/sglang/multimodal_gen/runtime/layers/attention/backends/aiter.py
Outdated
Show resolved
Hide resolved
1a9891b to
586b79b
Compare
|
/tag-and-rerun-ci 11/26 |
|
Automatic Data Type Casting for AITer: I suggest falling back to SDPA instead of AITER in CLIP or other model except DIT part to avoid some image incorrectness. |
|
You are the GOAT! |
e3be8e3 to
a4e74e7
Compare
81e4bf1 to
5b4d240
Compare
|
/rerun-failed-ci |
Co-authored-by: Sabre Shao <sabre.shao@amd.com> Co-authored-by: Yusheng (Ethan) Su <yushengsu.thu@gmail.com> Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
- Fix GPU OOM in sequential tests on ROCm/AMD with explicit memory cleanup - Skip Ring Attention tests on AMD/ROCm (unsupported) - Fix SGLANG_TEST_OUTPUT_SIZE not applied to actual test requests - Add MIOpen kernel caching for AMD VAE performance - Add diagnostics for HF cache and system resources - Add disk cleanup for non-persistent HF cache between tests - Enable all diffusion tests including LoRA (except FLUX.2 on 1-GPU)
The Docker image contains pre-compiled AITER kernels at /sgl-workspace/aiter/aiter/jit/ which may be incompatible. Clear them before running tests to force fresh JIT compilation.
| if self.height is None: | ||
| self.height_not_provided = True | ||
|
|
||
| # Allow env var to override num_inference_steps (for faster CI testing on AMD) |
There was a problem hiding this comment.
please fix it in a follow-up PR, this can be passed via sampling params
Could you please provide more detail about this, under which circumstance would this cause an issue? Thanks! |
Done! Updated PR #13760 with the new description.
PR: #13760
The CI should be running now with the rebased changes. The key fix for the ~100x slow loading is the
should_offload()bug fix incomponent_loader.py.