feat: add initial ARM64 (aarch64) architecture support#1875
feat: add initial ARM64 (aarch64) architecture support#1875tomassrnka wants to merge 75 commits intomainfrom
Conversation
Replace hardcoded GOARCH=amd64 and --platform linux/amd64 with $(shell go env GOARCH) across all service Makefiles. This enables building on ARM64 (Apple Silicon) without manual overrides while preserving existing amd64 behavior on x86_64 hosts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Disable SMT (Simultaneous Multi-Threading) on ARM64 since it's not supported by ARM processors - Use runtime.GOARCH for OCI image platform instead of hardcoded "amd64" - Add ARM64 fallback in CPU detection (gopsutil doesn't populate Family/Model fields from /proc/cpuinfo on ARM64) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Some busybox versions (e.g., busybox-static on ARM64 Ubuntu) lack chattr support. Make the calls non-fatal so provisioning succeeds regardless of busybox capabilities. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 589a0596cb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
730d2d7 to
06f483f
Compare
8e7806e to
20baf1c
Compare
| // On ARM64, gopsutil doesn't populate Family/Model from /proc/cpuinfo. | ||
| // Provide fallback values so callers don't get an error. | ||
| if (family == "" || model == "") && runtime.GOARCH == "arm64" { | ||
| if family == "" { | ||
| family = "arm64" | ||
| } | ||
| if model == "" { | ||
| model = "0" | ||
| } | ||
| } else if family == "" || model == "" { |
There was a problem hiding this comment.
let's make it cleaner
| // On ARM64, gopsutil doesn't populate Family/Model from /proc/cpuinfo. | |
| // Provide fallback values so callers don't get an error. | |
| if (family == "" || model == "") && runtime.GOARCH == "arm64" { | |
| if family == "" { | |
| family = "arm64" | |
| } | |
| if model == "" { | |
| model = "0" | |
| } | |
| } else if family == "" || model == "" { | |
| // On ARM64, gopsutil doesn't populate Family/Model from /proc/cpuinfo. | |
| // Provide fallback values so callers don't get an error. | |
| if (runtime.GOARCH == "arm64") { | |
| if family == "" { | |
| family = "arm64" | |
| } | |
| if model == "" { | |
| model = "0" | |
| } | |
| } | |
| if family == "" || model == "" { |
Standardize Firecracker binary paths to include architecture:
{version}/{arch}/firecracker (e.g. v1.12.1/arm64/firecracker)
This aligns with the fc-kernels convention of {version}/{arch}/ and
prepares for multi-arch production deployments.
Changes:
- config.go: FirecrackerPath includes runtime.GOARCH in path
- create-build: download FC from GCS bucket (not GitHub releases),
using {version}/{arch}/firecracker with legacy fallback on 404
- create-build: add errNotFound sentinel for reliable 404 detection,
handle url.JoinPath errors explicitly
- script_builder_test.go: update expected paths for arch directory
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The embedded busybox binary is x86_64. On ARM64 hosts, add a Makefile target that extracts the correct busybox from the busybox-static apt package, ensuring the binary comes from a trusted source rather than copying an arbitrary system binary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a new PR workflow that: - Cross-compiles all packages for arm64 on x86_64 runners (catches build issues without needing ARM64 hardware) - Runs unit tests on ubuntu-24.04-arm runners for packages that don't require KVM (api, client-proxy, db, docker-reverse-proxy, shared) Orchestrator and envd are excluded from ARM64 unit tests since they require KVM, hugepages, NBD, and other kernel features only available on self-hosted runners. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
04f0ec3 to
45e635f
Compare
Document the architecture naming convention (amd64/arm64 vs x86_64/aarch64), binary path layout, ARM64-specific behavior, and the fc-kernels naming discrepancy that needs follow-up alignment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TARGET_ARCH environment variable that allows deploying for a different architecture than the build host (e.g. deploying x86_64 sandboxes from an ARM64 Mac). When unset, falls back to runtime.GOARCH. - Add TargetArch() helper in shared/pkg/utils - Use TargetArch() for binary paths, OCI platform, and downloads - Add BUILD_ARCH variable to all Makefiles (reads TARGET_ARCH) - Keep runtime.GOARCH for hardware-dependent code (SMT, CPU detection) - Document TARGET_ARCH in orchestrator README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FirecrackerPath() now checks if the arch-prefixed path
({version}/{arch}/firecracker) exists on disk before using it.
If not found, falls back to the legacy flat path
({version}/firecracker) for backward compatibility with existing
production nodes that haven't migrated to the new layout.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same pattern as FirecrackerPath — try arch-prefixed path
({version}/{arch}/vmlinux.bin) first, fall back to legacy flat
path ({version}/vmlinux.bin) for existing production nodes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make fetch-busybox a prerequisite of build-local so the correct arch busybox binary is fetched automatically. On amd64 this is a no-op (prints "Using bundled amd64 busybox"). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add BUILD_ARCH variable (defaults to `go env GOARCH`) to all service Makefiles so devs can cross-build for remote clusters with a different architecture (e.g., BUILD_ARCH=amd64 make build-and-upload). - Replace dpkg-based arch detection in fetch-busybox with `go env GOARCH` fallback for better cross-platform support (non-Debian hosts). - Add comments explaining why chattr is non-fatal in provision.sh (ARM64 busybox-static packages may omit chattr). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
bugbot run |
|
One more check so this does not break current production: No new required env vars. No changed function signatures that affect callers (the DefaultPlatform var→func was updated at all call sites). No changes to Terraform, Nomad jobs, init scripts, or .env files. |
|
Rebased on latest main and addressed all review feedback. Verified safe to merge — zero behavior change on existing amd64 production without any .env What's in the PR: ARM64 support:
Data races fixed (found by -race on ARM64 CI):
CI:
Still possible TODOs (separate PR):
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6474bd3e09
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
fetch-busybox was detecting arch independently via go env GOARCH, which would mismatch when cross-compiling with BUILD_ARCH (e.g., BUILD_ARCH=amd64 on an ARM64 host would embed an ARM64 busybox into an amd64 orchestrator binary). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add BUILD_PLATFORM variable (defaults to linux/$BUILD_ARCH) to all service Makefiles. This allows building for multiple architectures in a single Docker buildx invocation: BUILD_PLATFORM=linux/amd64,linux/arm64 make build-and-upload Go builds still use BUILD_ARCH for single-arch compilation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
- TargetArch() defaults to runtime.GOARCH instead of hardcoded "amd64", so ARM64 hosts auto-detect without needing TARGET_ARCH env var - fetch-busybox tries multiple methods (existing binary check, host busybox copy, apt download) instead of requiring apt/dpkg-deb - setup-arm64-runner.sh is now idempotent — uses > instead of >> for config files, guards fstab append with grep Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Avoids leaving /tmp/chattr_err behind after provisioning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- machineinfo: separate ARM64 fallback from error check (jakubno suggestion) - fetch-busybox: verify host busybox is statically linked before copying Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The multi-arch approach (building e2bdev/base:latest for both amd64 and arm64) is preferred over a runtime env var override. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The non-fatal chattr handling is unnecessary — the fetch-busybox target ensures a proper busybox binary with chattr support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
The revive linter false-positives on recover() in a deferred helper. Adding a reason comment satisfies nolintlint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This lint issue exists on main — not related to ARM64 changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Without this, build-debug on ARM64 embeds the wrong-architecture (amd64) busybox binary, causing silent runtime failures in sandboxes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Adds ARM64/aarch64 architecture support to the E2B infrastructure, enabling builds and sandbox execution on Apple Silicon and other ARM64 hosts (via Lima VM + nested KVM).
Changes by commit:
GOARCH=amd64and--platform linux/amd64with$(shell go env GOARCH)across all 4 service Makefilesruntime.GOARCHfor OCI image platform, add ARM64 fallback for CPU detection (gopsutil doesn't populate Family/Model on ARM)chattrcalls non-fatal (|| true) for busybox versions that lack itarm64/subdirectory first, falls back to generic),E2B_BASE_IMAGEenv var for base image overrideRelated PRs:
Test plan
make fetch-busyboxon ARM64 host to swap busybox binarycreate-buildon ARM64uname -min sandbox returnsaarch64🤖 Generated with Claude Code
Note
Medium Risk
Touches orchestrator sandbox/runtime path resolution, Firecracker/kernel fetching, and OCI image platform selection, which can affect build/sandbox startup on both architectures. Changes are mostly additive with legacy fallbacks and added tests, but failures would impact infra execution paths.
Overview
Adds initial ARM64 support end-to-end by introducing ARM64 PR CI coverage and a runner bootstrap script, making service builds and Docker publishing architecture-aware, and teaching the orchestrator to resolve/pull arch-specific kernels, Firecracker binaries, and OCI images via a normalized
TARGET_ARCH(with legacy fallback paths). It also hardens several concurrency- and environment-sensitive tests/paths (e.g., hugepage allocation skipping, Connect response reuse, NBD goroutine captures, cleaner statting) and disables SMT plus CPU family/model strictness on ARM64 to match platform constraints.Written by Cursor Bugbot for commit c9e9314. This will update automatically on new commits. Configure here.