Skip to content

feat: add initial ARM64 (aarch64) architecture support#1875

Open
tomassrnka wants to merge 75 commits intomainfrom
arm64-support
Open

feat: add initial ARM64 (aarch64) architecture support#1875
tomassrnka wants to merge 75 commits intomainfrom
arm64-support

Conversation

@tomassrnka
Copy link
Member

@tomassrnka tomassrnka commented Feb 10, 2026

Summary

Adds ARM64/aarch64 architecture support to the E2B infrastructure, enabling builds and sandbox execution on Apple Silicon and other ARM64 hosts (via Lima VM + nested KVM).

Changes by commit:

  1. Makefiles — Replace hardcoded GOARCH=amd64 and --platform linux/amd64 with $(shell go env GOARCH) across all 4 service Makefiles
  2. Go runtime detection — Disable SMT on ARM64 (not supported), use runtime.GOARCH for OCI image platform, add ARM64 fallback for CPU detection (gopsutil doesn't populate Family/Model on ARM)
  3. Provision script — Make chattr calls non-fatal (|| true) for busybox versions that lack it
  4. create-build — Arch-aware Firecracker and kernel download URLs (tries arm64/ subdirectory first, falls back to generic), E2B_BASE_IMAGE env var for base image override
  5. fetch-busybox — Makefile target to swap the embedded x86 busybox binary with the system's ARM64 busybox-static before compilation

Related PRs:

Test plan

  • Build orchestrator, envd, API, and client-proxy natively on ARM64 Linux
  • Run make fetch-busybox on ARM64 host to swap busybox binary
  • Template build succeeds with create-build on ARM64
  • Sandbox create/exec/delete works on ARM64 (Lima VM + KVM)
  • Verify uname -m in sandbox returns aarch64
  • Confirm no regression on x86_64 builds (all changes are backwards compatible)

🤖 Generated with Claude Code


Note

Medium Risk
Touches orchestrator sandbox/runtime path resolution, Firecracker/kernel fetching, and OCI image platform selection, which can affect build/sandbox startup on both architectures. Changes are mostly additive with legacy fallbacks and added tests, but failures would impact infra execution paths.

Overview
Adds initial ARM64 support end-to-end by introducing ARM64 PR CI coverage and a runner bootstrap script, making service builds and Docker publishing architecture-aware, and teaching the orchestrator to resolve/pull arch-specific kernels, Firecracker binaries, and OCI images via a normalized TARGET_ARCH (with legacy fallback paths). It also hardens several concurrency- and environment-sensitive tests/paths (e.g., hugepage allocation skipping, Connect response reuse, NBD goroutine captures, cleaner statting) and disables SMT plus CPU family/model strictness on ARM64 to match platform constraints.

Written by Cursor Bugbot for commit c9e9314. This will update automatically on new commits. Configure here.

tomassrnka and others added 3 commits February 9, 2026 16:51
Replace hardcoded GOARCH=amd64 and --platform linux/amd64 with
$(shell go env GOARCH) across all service Makefiles. This enables
building on ARM64 (Apple Silicon) without manual overrides while
preserving existing amd64 behavior on x86_64 hosts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Disable SMT (Simultaneous Multi-Threading) on ARM64 since it's not
  supported by ARM processors
- Use runtime.GOARCH for OCI image platform instead of hardcoded "amd64"
- Add ARM64 fallback in CPU detection (gopsutil doesn't populate
  Family/Model fields from /proc/cpuinfo on ARM64)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Some busybox versions (e.g., busybox-static on ARM64 Ubuntu) lack
chattr support. Make the calls non-fatal so provisioning succeeds
regardless of busybox capabilities.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 589a0596cb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 28 to 37
// On ARM64, gopsutil doesn't populate Family/Model from /proc/cpuinfo.
// Provide fallback values so callers don't get an error.
if (family == "" || model == "") && runtime.GOARCH == "arm64" {
if family == "" {
family = "arm64"
}
if model == "" {
model = "0"
}
} else if family == "" || model == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make it cleaner

Suggested change
// On ARM64, gopsutil doesn't populate Family/Model from /proc/cpuinfo.
// Provide fallback values so callers don't get an error.
if (family == "" || model == "") && runtime.GOARCH == "arm64" {
if family == "" {
family = "arm64"
}
if model == "" {
model = "0"
}
} else if family == "" || model == "" {
// On ARM64, gopsutil doesn't populate Family/Model from /proc/cpuinfo.
// Provide fallback values so callers don't get an error.
if (runtime.GOARCH == "arm64") {
if family == "" {
family = "arm64"
}
if model == "" {
model = "0"
}
}
if family == "" || model == "" {

tomassrnka and others added 3 commits February 10, 2026 11:27
Standardize Firecracker binary paths to include architecture:
  {version}/{arch}/firecracker (e.g. v1.12.1/arm64/firecracker)

This aligns with the fc-kernels convention of {version}/{arch}/ and
prepares for multi-arch production deployments.

Changes:
- config.go: FirecrackerPath includes runtime.GOARCH in path
- create-build: download FC from GCS bucket (not GitHub releases),
  using {version}/{arch}/firecracker with legacy fallback on 404
- create-build: add errNotFound sentinel for reliable 404 detection,
  handle url.JoinPath errors explicitly
- script_builder_test.go: update expected paths for arch directory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The embedded busybox binary is x86_64. On ARM64 hosts, add a Makefile
target that extracts the correct busybox from the busybox-static
apt package, ensuring the binary comes from a trusted source rather
than copying an arbitrary system binary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a new PR workflow that:
- Cross-compiles all packages for arm64 on x86_64 runners (catches
  build issues without needing ARM64 hardware)
- Runs unit tests on ubuntu-24.04-arm runners for packages that
  don't require KVM (api, client-proxy, db, docker-reverse-proxy,
  shared)

Orchestrator and envd are excluded from ARM64 unit tests since they
require KVM, hugepages, NBD, and other kernel features only available
on self-hosted runners.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the architecture naming convention (amd64/arm64 vs x86_64/aarch64),
binary path layout, ARM64-specific behavior, and the fc-kernels naming
discrepancy that needs follow-up alignment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tomassrnka tomassrnka marked this pull request as draft February 10, 2026 19:41
tomassrnka and others added 5 commits February 10, 2026 11:47
Add TARGET_ARCH environment variable that allows deploying for a
different architecture than the build host (e.g. deploying x86_64
sandboxes from an ARM64 Mac). When unset, falls back to runtime.GOARCH.

- Add TargetArch() helper in shared/pkg/utils
- Use TargetArch() for binary paths, OCI platform, and downloads
- Add BUILD_ARCH variable to all Makefiles (reads TARGET_ARCH)
- Keep runtime.GOARCH for hardware-dependent code (SMT, CPU detection)
- Document TARGET_ARCH in orchestrator README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FirecrackerPath() now checks if the arch-prefixed path
({version}/{arch}/firecracker) exists on disk before using it.
If not found, falls back to the legacy flat path
({version}/firecracker) for backward compatibility with existing
production nodes that haven't migrated to the new layout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same pattern as FirecrackerPath — try arch-prefixed path
({version}/{arch}/vmlinux.bin) first, fall back to legacy flat
path ({version}/vmlinux.bin) for existing production nodes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make fetch-busybox a prerequisite of build-local so the correct
arch busybox binary is fetched automatically. On amd64 this is a
no-op (prints "Using bundled amd64 busybox").

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add BUILD_ARCH variable (defaults to `go env GOARCH`) to all service
  Makefiles so devs can cross-build for remote clusters with a different
  architecture (e.g., BUILD_ARCH=amd64 make build-and-upload).
- Replace dpkg-based arch detection in fetch-busybox with `go env GOARCH`
  fallback for better cross-platform support (non-Debian hosts).
- Add comments explaining why chattr is non-fatal in provision.sh
  (ARM64 busybox-static packages may omit chattr).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tomassrnka tomassrnka changed the title feat: add ARM64 (aarch64) architecture support feat: add initial ARM64 (aarch64) architecture support Feb 24, 2026
@tomassrnka
Copy link
Member Author

bugbot run

@tomassrnka
Copy link
Member Author

One more check so this does not break current production:

  File: utils/env.go
  What changed: New TargetArch() function
  amd64 prod behavior: Returns "amd64" (default). No existing code calls it on main — only new code below uses it.
  ────────────────────────────────────────
  File: fc/config.go
  What changed: HostKernelPath/FirecrackerPath try {ver}/amd64/binary first
  amd64 prod behavior: os.Stat fails (dir doesn't exist on prod nodes), falls through to identical legacy path {ver}/binary. One extra syscall, no
    behavior change.
  ────────────────────────────────────────
  File: fc/client.go
  What changed: smt := runtime.GOARCH != "arm64"
  amd64 prod behavior: "amd64" != "arm64" → true. Was hardcoded true before. Identical.
  ────────────────────────────────────────
  File: machineinfo/main.go
  What changed: ARM64 fallback for empty Family/Model
  amd64 prod behavior: On amd64, gopsutil populates both fields. The && runtime.GOARCH == "arm64" guard means the fallback is dead code on amd64. The
  else
     if error path is identical to the old code.
  ────────────────────────────────────────
  File: oci/oci.go
  What changed: DefaultPlatform var → DefaultPlatform() func
  amd64 prod behavior: Returns {linux, amd64} via TargetArch(). Same value, all callers updated.
  ────────────────────────────────────────
  File: nbd/path_direct.go
  What changed: Data race fix: capture loop vars before goroutine
  amd64 prod behavior: Bug fix only — captures deviceIndex and i into devIdx/sockIdx before goroutine. Same values, eliminates race.
  ────────────────────────────────────────
  File: clean-nfs-cache/
  What changed: Data race fix: pass dirPath string instead of *os.File
  amd64 prod behavior: Bug fix only — avoids race between df.Close() and df.Fd(). Uses filepath.Join(dirPath, filename) with AT_FDCWD instead.
    Functionally identical.
  ────────────────────────────────────────
  File: provision.sh
  What changed: chattr wrapped in if/else
  amd64 prod behavior: On amd64 where chattr works: enters the if branch, prints "ok". Same effect, just with logging.
  ────────────────────────────────────────
  File: uffd/testutils/page_mmap.go
  What changed: Graceful hugepage skip in tests
  amd64 prod behavior: Test-only. Not production.

No new required env vars. No changed function signatures that affect callers (the DefaultPlatform var→func was updated at all call sites). No changes to Terraform, Nomad jobs, init scripts, or .env files.

@tomassrnka
Copy link
Member Author

Rebased on latest main and addressed all review feedback. Verified safe to merge — zero behavior change on existing amd64 production without any .env
or config modifications. All ARM64 code paths are gated behind TARGET_ARCH env var or runtime.GOARCH == "arm64" checks that default to amd64.

What's in the PR:

ARM64 support:

  • TARGET_ARCH env var for cross-architecture deployment (defaults to amd64)
  • BUILD_ARCH variable in all 4 service Makefiles for cross-building (BUILD_ARCH=amd64 make build-and-upload from ARM64 host)
  • Arch-aware kernel/Firecracker downloads from GCS with legacy flat-path fallback for existing amd64 nodes
  • SMT disabled on ARM64 (Firecracker rejects it), machineinfo fallback for gopsutil missing Family/Model on ARM
  • OCI platform respects TARGET_ARCH
  • fetch-busybox target for ARM64 busybox-static swap
  • provision.sh chattr made non-fatal (ARM64 busybox may lack it)
  • protoc-gen-go-grpc upgraded to v1.6.1 for ARM support

Data races fixed (found by -race on ARM64 CI):

  • NBD path_direct.go — loop variable capture before goroutine
  • clean-nfs-cache — *os.File fd race between Scanner and Statter goroutines
  • ErrorCollector test — race on ctx variable
  • envd legacy conversion test
  • Upgraded go-nfs fork with additional race fixes

CI:

  • pr-tests-arm64.yml — cross-compilation of all packages + unit test matrix on native ARM64 runners
  • Hugepage tests skip gracefully when insufficient pages available

Still possible TODOs (separate PR):

  • Merge pr-tests-arm64.yml into pr-tests.yml as a single workflow with arch matrix
  • ARM64 integration tests

@tomassrnka tomassrnka marked this pull request as ready for review February 24, 2026 13:34
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6474bd3e09

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

tomassrnka and others added 3 commits February 24, 2026 15:17
fetch-busybox was detecting arch independently via go env GOARCH,
which would mismatch when cross-compiling with BUILD_ARCH (e.g.,
BUILD_ARCH=amd64 on an ARM64 host would embed an ARM64 busybox
into an amd64 orchestrator binary).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add BUILD_PLATFORM variable (defaults to linux/$BUILD_ARCH) to all
service Makefiles. This allows building for multiple architectures
in a single Docker buildx invocation:

  BUILD_PLATFORM=linux/amd64,linux/arm64 make build-and-upload

Go builds still use BUILD_ARCH for single-arch compilation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

tomassrnka and others added 3 commits February 25, 2026 11:04
- TargetArch() defaults to runtime.GOARCH instead of hardcoded "amd64",
  so ARM64 hosts auto-detect without needing TARGET_ARCH env var
- fetch-busybox tries multiple methods (existing binary check, host
  busybox copy, apt download) instead of requiring apt/dpkg-deb
- setup-arm64-runner.sh is now idempotent — uses > instead of >> for
  config files, guards fstab append with grep

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

tomassrnka and others added 6 commits February 25, 2026 11:29
Avoids leaving /tmp/chattr_err behind after provisioning.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- machineinfo: separate ARM64 fallback from error check (jakubno suggestion)
- fetch-busybox: verify host busybox is statically linked before copying

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The multi-arch approach (building e2bdev/base:latest for both amd64
and arm64) is preferred over a runtime env var override.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The non-fatal chattr handling is unnecessary — the fetch-busybox
target ensures a proper busybox binary with chattr support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

tomassrnka and others added 2 commits February 25, 2026 14:21
The revive linter false-positives on recover() in a deferred helper.
Adding a reason comment satisfies nolintlint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This lint issue exists on main — not related to ARM64 changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

tomassrnka and others added 2 commits February 26, 2026 12:46
Without this, build-debug on ARM64 embeds the wrong-architecture
(amd64) busybox binary, causing silent runtime failures in sandboxes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants