Skip to content

[AMD] Enable all diffusion models and fix encoder loading on MI325#13760

Merged
mickqian merged 27 commits intosgl-project:mainfrom
zyzshishui:amd_diffusion
Dec 19, 2025
Merged

[AMD] Enable all diffusion models and fix encoder loading on MI325#13760
mickqian merged 27 commits intosgl-project:mainfrom
zyzshishui:amd_diffusion

Conversation

@zyzshishui
Copy link
Contributor

@zyzshishui zyzshishui commented Nov 22, 2025

Done! Updated PR #13760 with the new description.

PR: #13760

The CI should be running now with the rebased changes. The key fix for the ~100x slow loading is the should_offload() bug fix in component_loader.py.

@github-actions github-actions bot added documentation Improvements or additions to documentation amd dependencies Pull requests that update a dependency file diffusion SGLang Diffusion labels Nov 22, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @zyzshishui, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands Sglang Diffusion's compatibility by integrating full support for AMD's ROCm platform. The changes enable efficient execution of diffusion models on AMD GPUs, primarily by adopting AITer as the default attention backend and implementing robust data type handling. It also streamlines the development environment setup with a new ROCm-specific Dockerfile and improves stability by localizing external dependencies and preventing profiler data conflicts. The overall impact is a more versatile and performant Sglang Diffusion for a broader range of hardware.

Highlights

  • ROCm Support for Sglang Diffusion: This pull request introduces comprehensive support for Sglang Diffusion on ROCm-enabled AMD GPUs, allowing the framework to leverage AMD hardware for diffusion workloads. This includes a new Dockerfile for building ROCm diffusion images and specific environment variable configurations.
  • AITer as Default Attention Backend on ROCm: The AITer backend is now set as the default attention mechanism for Sglang Diffusion when running on ROCm. This is intended to optimize performance and compatibility on AMD platforms.
  • Automatic Data Type Casting for AITer: The AITer backend now transparently casts fp32 Query, Key, and Value (QKV) tensors to bf16/fp16 and casts the outputs back to the original dtype. This resolves issues where CLIP/image encoders would crash due to unsupported data types.
  • Profiler Trace Overwrite Prevention: Profiler traces are now suffixed with rank IDs to prevent overwrites in multi-GPU or distributed environments, ensuring unique and accurate profiling data for each process.
  • Decoupling from Yunchang Library: The dependency on the yunchang library for sequence parallelism (sp_degree/ulysses_degree) has been removed. Relevant functions have been localized within the SGLang codebase to avoid torch not found errors when installing yunchang on ROCm.
  • Documentation Updates for ROCm: Installation documentation has been updated to include a dedicated ROCm quickstart guide, providing clear instructions for AMD Instinct/ROCm users on kernel builds and attention backend settings validated on MI300X GPUs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces ROCm support for SGLang Diffusion, a significant step towards broader hardware compatibility. The changes include a new Dockerfile for ROCm, the integration of the AITer attention backend, and various code modifications to ensure compatibility and remove problematic dependencies on ROCm. My review has identified a couple of critical issues—one in the Dockerfile that would cause build failures and another in the AITer backend implementation that could lead to runtime errors. I have also provided suggestions to improve Dockerfile efficiency and documentation clarity. Overall, this is a valuable contribution that enables SGLang Diffusion on AMD hardware.

@zyzshishui zyzshishui force-pushed the amd_diffusion branch 2 times, most recently from 1a9891b to 586b79b Compare November 22, 2025 08:26
@hubertlu-tw
Copy link
Collaborator

hubertlu-tw commented Nov 23, 2025

/tag-and-rerun-ci 11/26

@sunxxuns sunxxuns added run-ci and removed run-ci labels Nov 27, 2025
@sabreshao
Copy link
Contributor

Automatic Data Type Casting for AITer: I suggest falling back to SDPA instead of AITER in CLIP or other model except DIT part to avoid some image incorrectness.

@zhaochenyang20
Copy link
Collaborator

You are the GOAT!

@guapisolo
Copy link
Contributor

/rerun-failed-ci

zyzshishui and others added 27 commits December 19, 2025 03:43
Co-authored-by: Sabre Shao <sabre.shao@amd.com>
Co-authored-by: Yusheng (Ethan) Su <yushengsu.thu@gmail.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
- Fix GPU OOM in sequential tests on ROCm/AMD with explicit memory cleanup
- Skip Ring Attention tests on AMD/ROCm (unsupported)
- Fix SGLANG_TEST_OUTPUT_SIZE not applied to actual test requests
- Add MIOpen kernel caching for AMD VAE performance
- Add diagnostics for HF cache and system resources
- Add disk cleanup for non-persistent HF cache between tests
- Enable all diffusion tests including LoRA (except FLUX.2 on 1-GPU)
The Docker image contains pre-compiled AITER kernels at
/sgl-workspace/aiter/aiter/jit/ which may be incompatible.
Clear them before running tests to force fresh JIT compilation.
if self.height is None:
self.height_not_provided = True

# Allow env var to override num_inference_steps (for faster CI testing on AMD)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix it in a follow-up PR, this can be passed via sampling params

@zyzshishui
Copy link
Contributor Author

Automatic Data Type Casting for AITer: I suggest falling back to SDPA instead of AITER in CLIP or other model except DIT part to avoid some image incorrectness.

Could you please provide more detail about this, under which circumstance would this cause an issue? Thanks!

@mickqian mickqian mentioned this pull request Jan 8, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

amd dependencies Pull requests that update a dependency file diffusion SGLang Diffusion documentation Improvements or additions to documentation run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants