docker: add clean build and wheel-based install Dockerfiles#278
docker: add clean build and wheel-based install Dockerfiles#278
Conversation
Build Verification Results
E2E Inference LimitationE2E inference with the clean image (ENABLE_CK=0) fails because AITER's attention path ( Once that fix lands, the clean Docker image will support full E2E inference out of the box. |
Add two new Dockerfiles for faster, reproducible deployments: - Dockerfile.wheels: multi-stage builder that produces all dependency wheels (torch, triton, triton_kernels, flydsl, mori, amd_aiter). Downloads pre-built wheels where available (PyPI Triton 3.6, AMD nightly FlyDSL), builds MORI and AITER (ENABLE_CK=0) from source. Total build time ~10 min. - Dockerfile.clean: wheel-only install with zero source compilation. Uses BuildKit bind-mount from Dockerfile.wheels output. Total install time ~4 min.
0717764 to
5872f85
Compare
There was a problem hiding this comment.
Pull request overview
Adds alternative Docker build paths that prebuild/download dependency wheels once and then create a “clean” runtime image by installing only from wheels (no source compilation), significantly reducing Docker build times for ROCm environments.
Changes:
- Add
docker/Dockerfile.wheelsto build/download wheels for major dependencies (PyTorch ROCm nightly, Triton, triton_kernels, FlyDSL, MORI, AITER). - Add
docker/Dockerfile.cleanto install exclusively from those prebuilt wheels using BuildKit build-context mounting.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| docker/Dockerfile.wheels | Wheel builder image that downloads/builds dependency wheels into /wheels for reuse. |
| docker/Dockerfile.clean | Runtime image that installs all dependencies from provided wheels via BuildKit mount, then installs ATOM from the build context. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| git python3-pip python3-dev \ | ||
| ibverbs-utils libpci-dev locales \ | ||
| openmpi-bin libopenmpi-dev libdw1 \ | ||
| && rm -rf /var/lib/apt/lists/* |
There was a problem hiding this comment.
apt-get install includes git, but this Dockerfile does not use git (and the header says “No git clones”). Removing git reduces image size and keeps the “minimal” runtime promise accurate (or update the comment if git is intentionally required at runtime).
| RUN --mount=type=bind,from=wheels,source=/,target=/mnt/wheels \ | ||
| mkdir -p /tmp/wheels \ | ||
| && find /mnt/wheels -name '*.whl' -exec cp {} /tmp/wheels/ \; \ | ||
| && ls -lhS /tmp/wheels/*.whl \ |
There was a problem hiding this comment.
The wheels mount copies from source=/ and then runs find /mnt/wheels -name '*.whl' .... When wheels is a docker-image build context, this will traverse the entire image filesystem and can accidentally pick up wheels from unexpected locations (pip cache, etc.) and slow the build. Consider limiting the mount/copy to the known wheels directory produced by Dockerfile.wheels (/wheels), or explicitly searching only that subdirectory when present.
| RUN git clone --depth=1 --branch ${AITER_BRANCH} ${AITER_REPO} /build/aiter | ||
|
|
||
| RUN cd /build/aiter \ | ||
| && pip3 install --break-system-packages -r requirements.txt \ | ||
| && export ENABLE_CK=0 PREBUILD_TRITON=${PREBUILD_TRITON} \ | ||
| PREBUILD_TRITON_ARCHS="gfx942,gfx950" \ | ||
| MAX_JOBS=${MAX_JOBS} GPU_ARCHS=${GPU_ARCH_LIST} \ | ||
| && pip3 install --break-system-packages --no-build-isolation -e . \ |
There was a problem hiding this comment.
AITER is cloned but submodules aren’t initialized/updated before building. The existing docker/Dockerfile does a git submodule ... update --init --recursive for AITER, which suggests the build may rely on submodules; without this, the wheel build can fail or produce an incomplete package. Please add the submodule init/update step (or use git clone --recurse-submodules) before installing/building.
| && export ENABLE_CK=0 PREBUILD_TRITON=${PREBUILD_TRITON} \ | ||
| PREBUILD_TRITON_ARCHS="gfx942,gfx950" \ | ||
| MAX_JOBS=${MAX_JOBS} GPU_ARCHS=${GPU_ARCH_LIST} \ |
There was a problem hiding this comment.
PREBUILD_TRITON_ARCHS is hardcoded to gfx942,gfx950 even though the target arch list is configurable via ARG GPU_ARCH. If someone builds with a different GPU_ARCH, AITER will still prebuild for 942/950, which can lead to incorrect or missing kernels. Consider deriving PREBUILD_TRITON_ARCHS from the same build arg (or introducing a separate build arg) to keep them consistent.
| # ATOM Docker — Multi-stage wheel builder | ||
| # | ||
| # Builds/downloads all wheels needed for Dockerfile.clean: |
There was a problem hiding this comment.
The header comment calls this a “Multi-stage wheel builder”, but the Dockerfile currently has a single stage (no FROM ... AS ... / multi-stage copy). Either adjust the wording or split into explicit stages if that was intended, to avoid confusing users.
|
Related AITER PR: ROCm/aiter#2227 — Adds a Triton fallback for |
Summary
Dockerfile.wheels: multi-stage builder that produces all dependency wheels (torch, triton, triton_kernels, flydsl, mori, amd_aiter)Dockerfile.clean: wheel-only install with zero source compilation, using pre-built wheels fromDockerfile.wheelsNo changes to the existing
Dockerfile.Build Time (MI300X node)
Dockerfile.wheelsDockerfile.cleanWheels Build Breakdown
Usage
Build wheels
docker build -f docker/Dockerfile.wheels -t atom:wheels .Install from wheels (zero compilation)
DOCKER_BUILDKIT=1 docker build \ --build-context wheels=docker-image://atom:wheels \ -f docker/Dockerfile.clean -t atom:clean .Test plan
Dockerfile.wheels— all 8 wheels producedDockerfile.cleanfrom wheels — all imports succeed (PyTorch, Triton, AITER, FlyDSL, MORI, ATOM)