Skip to content

Add binfmt_misc configuration for the build sandbox#15539

Open
dramforever wants to merge 3 commits intoNixOS:masterfrom
dramforever:xp-binfmt-misc
Open

Add binfmt_misc configuration for the build sandbox#15539
dramforever wants to merge 3 commits intoNixOS:masterfrom
dramforever:xp-binfmt-misc

Conversation

@dramforever
Copy link
Copy Markdown
Contributor

@dramforever dramforever commented Mar 22, 2026

Motivation

(First, to clarify some terminology: Cross compilation/build means Autotool's --build != --host, or another build system's equivalent. Using QEMU's user emulation is not cross - it is simply an native build with an emulator, and I will call it "emu-native". One argument is that in Nix's view the hash is the same as a native build.)

The existing boot.binfmt.emulatedSystems option in NixOS introduces an unfortunate impurity into cross builds: All Nix builds gain the ability to run programs for these extra platforms. This is not good for cross compilation since it allows misconfigured builds that run programs for the host platform to work.

Let's say someone regularly works on an aarch64-linux machine and also builds things for a riscv64-linux machine. They may have set up boot.binfmt.emulatedSystems = [ "riscv64-linux" ]; for testing software, and running emu-native builds. However, on this machine all the actually cross to riscv64-linux builds are now possibly "contaminated" - misconfigured cross builds that run riscv64-linux programs silently succeed and fail to reproduce on systems with no such emulator configured.

NixOS/nixpkgs#354533 does not help, since that only matters for derivations where system is, for our example, riscv64-linux, which isn't the case for cross.

An example where this impurity has tricked someone into thinking their cross compilation setup works is: NixOS/nixpkgs#447041

Therefore, add a new binfmt-misc setting, gated behind the experimental feature with the same name. Where the derivation platform matches one of the keys of binfmt-misc, the build sandbox is run with its own binfmt_misc instance, isolated from the outside. This allows both native builds to run without binfmt_misc interpreters as an impurity, and also allows emu-native builds to use emulators without having to set them up globally. Even an unprivileged Nix can use binfmt-misc for emu-native builds.

Context

This feature also relates to #1916.

This is only possible since Linux 6.7 (more precisely, torvalds/linux@21ca59b), where it is made possible for each user namespace to have its own binfmt_misc "instance". You can think of this as allowing containers to configure binfmt_misc for itself, without affecting the "host".

I will admit here that I have used some hedious user namespace trickery to handle the /proc and binfmt_misc filesystems' permissions involved, but I have this working for:

  • Both with uid-range and without
  • Unprivileged Nix (possibly with chroot store)

And AFAICT, even besides Linux's "don't break userspace" policy, this is working on Linux 6.7+ as intended. There's a lot of comments in the code, esp in src/libstore/unix/build/linux-derivation-builder.cc that will hopefully explain what was done in much more detail.

Ideally, this shouldn't change any functionality if the setting binfmt-misc is not configured.

Known TODOs and bikeshedding points

  • If this is idea approved, an experimental feature milestone should be added and filled in.
    • Xp::BinfmtMisc removed
  • The configuration format is really bad. What should it be?
    • I made it just point to a file in the same format as systemd-binfmt's binfmt.d(5). Should be less hedious now.
  • In what cases should the double userns trick be used?
    • Currently: Only when necessary
    • This interacts with uid-range in case a build wants to do its own binfmt_misc. It's unclear whether any option will be noticibly different from others.

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@dramforever

This comment was marked as outdated.

@dramforever dramforever changed the title Xp binfmt misc Add binfmt_misc configuration for the build sandbox Mar 22, 2026
@dramforever dramforever reopened this Mar 22, 2026
@dramforever dramforever marked this pull request as draft March 22, 2026 14:42
@dramforever dramforever marked this pull request as ready for review March 23, 2026 12:21
R"(
*Linux only*

A list of items, each in the form `platform=file` or `platform=`,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to make structured values like this a JSON array (see ExternalBuilders for an example). E.g.

binfmt-misc = [ {"platform": "foo", "file": "/bla" } ]

This avoids needing yet another ad hoc parser.

Copy link
Copy Markdown
Contributor Author

@dramforever dramforever Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that I have earlier made this a file path to just reuse the systemd-binfmt format, I'm not sure that JSON would be better than Setting<StringMap> anymore.

However I will keep that as an option in mind, in case it is decided that a more complex option format should be used (e.g. map of vector of string).

Copy link
Copy Markdown
Contributor

@xokdvium xokdvium Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing a format from systemd seems much better to me. Since there's the already established precedent from systemd, we have one less thing to bikeshed.

EDIT: wording, didn't fully wake up

Linux supports per-user-namespace binfmt_misc since Linux 6.7 (more
precisely, Linux kernel commit 21ca59b365c0 ("binfmt_misc: enable
sandboxed mounts") [1]), which was only about two years ago. Add a
check for this feature so that kernels not supporting this feature could
be detected.

[1]: https://git.kernel.org/torvalds/c/21ca59b365c091d583f36ac753eaa8baf947be6f
@dramforever dramforever force-pushed the xp-binfmt-misc branch 3 times, most recently from e91c523 to deec4b2 Compare March 27, 2026 12:32
@dramforever
Copy link
Copy Markdown
Contributor Author

Sorry for the rebasing mishaps. I have fixed some markdown underscores binfmt_misc -> binfmt\_misc, and brought back the usage of the systemd-binfmt format (which was accidentally lost in #15539 (comment) 372cabd..f699e11)

Add the binfmt-misc setting, which configures binfmt_misc interpreters
for the build sandbox. This allows, for example running emulated builds
for foreign platforms without configuring binfmt_misc globally, and
isolating cross compilation builds from systemd-wide binfmt_misc
interpreters.

One notable thing is that due to the way permissions work in the
binfmt_misc filesystem (and /proc in general), we have to use a henious
double-user-namespace trick to get this working in the most general
case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants