Skip to content

docker: add clean build and wheel-based install Dockerfiles#278

Open
sunway513 wants to merge 1 commit intoROCm:mainfrom
sunway513:docker/clean-build
Open

docker: add clean build and wheel-based install Dockerfiles#278
sunway513 wants to merge 1 commit intoROCm:mainfrom
sunway513:docker/clean-build

Conversation

@sunway513
Copy link
Collaborator

@sunway513 sunway513 commented Mar 8, 2026

Summary

  • Add Dockerfile.wheels: multi-stage builder that produces all dependency wheels (torch, triton, triton_kernels, flydsl, mori, amd_aiter)
  • Add Dockerfile.clean: wheel-only install with zero source compilation, using pre-built wheels from Dockerfile.wheels

No changes to the existing Dockerfile.

Build Time (MI300X node)

Dockerfile Time Notes
Dockerfile.wheels ~10 min MORI compile 3.4min, AITER 1min, rest are downloads
Dockerfile.clean ~4 min Pure pip install, zero compilation
Total ~14 min vs ~2+ hours for full source build

Wheels Build Breakdown

Step Time Description
System packages 49s apt-get install
PyTorch download 128s torch + torchvision + torchaudio (5.8GB)
Triton download (PyPI 3.6.0) 20s 180MB
triton_kernels 37s sparse-checkout from ROCm/triton, pure Python
FlyDSL download 11s pre-built nightly (70MB)
torch+triton install 82s needed for MORI/AITER builds
MORI compile 202s slowest component
AITER compile (CK-free) 63s ENABLE_CK=0

Usage

Build wheels

docker build -f docker/Dockerfile.wheels -t atom:wheels .

Install from wheels (zero compilation)

DOCKER_BUILDKIT=1 docker build \
  --build-context wheels=docker-image://atom:wheels \
  -f docker/Dockerfile.clean -t atom:clean .

Test plan

  • Build Dockerfile.wheels — all 8 wheels produced
  • Build Dockerfile.clean from wheels — all imports succeed (PyTorch, Triton, AITER, FlyDSL, MORI, ATOM)
  • Sanity check clean image — 8x MI300X detected, GPU matmul OK
  • E2E inference — blocked on AITER CK-free attention fallback (known limitation, being addressed by AITER team)

@sunway513
Copy link
Collaborator Author

Build Verification Results

Test Status Notes
Dockerfile.wheels build PASS 8 wheels produced in ~10 min
Dockerfile.clean build PASS Zero compilation, ~3.7 min
Clean image sanity check PASS All imports OK, 8x MI300X detected, GPU matmul OK
Clean image E2E inference BLOCKED See below
Dockerfile full build (ENABLE_CK=1) FAIL Triton 3.5.x source compilation error (upstream issue, unrelated to this PR)

E2E Inference Limitation

E2E inference with the clean image (ENABLE_CK=0) fails because AITER's attention path (fmha_v3_varlen_fwd) still attempts to JIT-compile CK modules even when ATOM_CK_FREE=1 and AITER_FORCE_TRITON_ATTN=1 are set. This is a known limitation — the CK-free attention fallback in AITER is being addressed by the AITER team.

Once that fix lands, the clean Docker image will support full E2E inference out of the box.

Add two new Dockerfiles for faster, reproducible deployments:

- Dockerfile.wheels: multi-stage builder that produces all dependency
  wheels (torch, triton, triton_kernels, flydsl, mori, amd_aiter).
  Downloads pre-built wheels where available (PyPI Triton 3.6, AMD
  nightly FlyDSL), builds MORI and AITER (ENABLE_CK=0) from source.
  Total build time ~10 min.

- Dockerfile.clean: wheel-only install with zero source compilation.
  Uses BuildKit bind-mount from Dockerfile.wheels output. Total
  install time ~4 min.
@sunway513 sunway513 force-pushed the docker/clean-build branch from 0717764 to 5872f85 Compare March 9, 2026 02:59
@sunway513 sunway513 marked this pull request as ready for review March 9, 2026 14:19
Copilot AI review requested due to automatic review settings March 9, 2026 14:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds alternative Docker build paths that prebuild/download dependency wheels once and then create a “clean” runtime image by installing only from wheels (no source compilation), significantly reducing Docker build times for ROCm environments.

Changes:

  • Add docker/Dockerfile.wheels to build/download wheels for major dependencies (PyTorch ROCm nightly, Triton, triton_kernels, FlyDSL, MORI, AITER).
  • Add docker/Dockerfile.clean to install exclusively from those prebuilt wheels using BuildKit build-context mounting.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
docker/Dockerfile.wheels Wheel builder image that downloads/builds dependency wheels into /wheels for reuse.
docker/Dockerfile.clean Runtime image that installs all dependencies from provided wheels via BuildKit mount, then installs ATOM from the build context.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +29 to +33
RUN apt-get update && apt-get install -y --no-install-recommends \
git python3-pip python3-dev \
ibverbs-utils libpci-dev locales \
openmpi-bin libopenmpi-dev libdw1 \
&& rm -rf /var/lib/apt/lists/*
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apt-get install includes git, but this Dockerfile does not use git (and the header says “No git clones”). Removing git reduces image size and keeps the “minimal” runtime promise accurate (or update the comment if git is intentionally required at runtime).

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +43
RUN --mount=type=bind,from=wheels,source=/,target=/mnt/wheels \
mkdir -p /tmp/wheels \
&& find /mnt/wheels -name '*.whl' -exec cp {} /tmp/wheels/ \; \
&& ls -lhS /tmp/wheels/*.whl \
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wheels mount copies from source=/ and then runs find /mnt/wheels -name '*.whl' .... When wheels is a docker-image build context, this will traverse the entire image filesystem and can accidentally pick up wheels from unexpected locations (pip cache, etc.) and slow the build. Consider limiting the mount/copy to the known wheels directory produced by Dockerfile.wheels (/wheels), or explicitly searching only that subdirectory when present.

Copilot uses AI. Check for mistakes.
Comment on lines +102 to +109
RUN git clone --depth=1 --branch ${AITER_BRANCH} ${AITER_REPO} /build/aiter

RUN cd /build/aiter \
&& pip3 install --break-system-packages -r requirements.txt \
&& export ENABLE_CK=0 PREBUILD_TRITON=${PREBUILD_TRITON} \
PREBUILD_TRITON_ARCHS="gfx942,gfx950" \
MAX_JOBS=${MAX_JOBS} GPU_ARCHS=${GPU_ARCH_LIST} \
&& pip3 install --break-system-packages --no-build-isolation -e . \
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AITER is cloned but submodules aren’t initialized/updated before building. The existing docker/Dockerfile does a git submodule ... update --init --recursive for AITER, which suggests the build may rely on submodules; without this, the wheel build can fail or produce an incomplete package. Please add the submodule init/update step (or use git clone --recurse-submodules) before installing/building.

Copilot uses AI. Check for mistakes.
Comment on lines +106 to +108
&& export ENABLE_CK=0 PREBUILD_TRITON=${PREBUILD_TRITON} \
PREBUILD_TRITON_ARCHS="gfx942,gfx950" \
MAX_JOBS=${MAX_JOBS} GPU_ARCHS=${GPU_ARCH_LIST} \
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PREBUILD_TRITON_ARCHS is hardcoded to gfx942,gfx950 even though the target arch list is configurable via ARG GPU_ARCH. If someone builds with a different GPU_ARCH, AITER will still prebuild for 942/950, which can lead to incorrect or missing kernels. Consider deriving PREBUILD_TRITON_ARCHS from the same build arg (or introducing a separate build arg) to keep them consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +3
# ATOM Docker — Multi-stage wheel builder
#
# Builds/downloads all wheels needed for Dockerfile.clean:
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment calls this a “Multi-stage wheel builder”, but the Dockerfile currently has a single stage (no FROM ... AS ... / multi-stage copy). Either adjust the wording or split into explicit stages if that was intended, to avoid confusing users.

Copilot uses AI. Check for mistakes.
@sunway513
Copy link
Collaborator Author

Related AITER PR: ROCm/aiter#2227 — Adds a Triton fallback for fused_rope_rms() in AITER. This is one of the pieces needed to unblock full E2E CK-free inference with the clean Docker image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants