Add binfmt_misc configuration for the build sandbox#15539
Add binfmt_misc configuration for the build sandbox#15539dramforever wants to merge 3 commits intoNixOS:masterfrom
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
cb074cb to
e451636
Compare
e451636 to
372cabd
Compare
| R"( | ||
| *Linux only* | ||
|
|
||
| A list of items, each in the form `platform=file` or `platform=`, |
There was a problem hiding this comment.
It's better to make structured values like this a JSON array (see ExternalBuilders for an example). E.g.
binfmt-misc = [ {"platform": "foo", "file": "/bla" } ]
This avoids needing yet another ad hoc parser.
There was a problem hiding this comment.
Given that I have earlier made this a file path to just reuse the systemd-binfmt format, I'm not sure that JSON would be better than Setting<StringMap> anymore.
However I will keep that as an option in mind, in case it is decided that a more complex option format should be used (e.g. map of vector of string).
There was a problem hiding this comment.
Reusing a format from systemd seems much better to me. Since there's the already established precedent from systemd, we have one less thing to bikeshed.
EDIT: wording, didn't fully wake up
Linux supports per-user-namespace binfmt_misc since Linux 6.7 (more
precisely, Linux kernel commit 21ca59b365c0 ("binfmt_misc: enable
sandboxed mounts") [1]), which was only about two years ago. Add a
check for this feature so that kernels not supporting this feature could
be detected.
[1]: https://git.kernel.org/torvalds/c/21ca59b365c091d583f36ac753eaa8baf947be6f
e91c523 to
deec4b2
Compare
|
Sorry for the rebasing mishaps. I have fixed some markdown underscores |
Add the binfmt-misc setting, which configures binfmt_misc interpreters for the build sandbox. This allows, for example running emulated builds for foreign platforms without configuring binfmt_misc globally, and isolating cross compilation builds from systemd-wide binfmt_misc interpreters. One notable thing is that due to the way permissions work in the binfmt_misc filesystem (and /proc in general), we have to use a henious double-user-namespace trick to get this working in the most general case.
deec4b2 to
268cbae
Compare
Motivation
(First, to clarify some terminology: Cross compilation/build means Autotool's
--build!=--host, or another build system's equivalent. Using QEMU's user emulation is not cross - it is simply an native build with an emulator, and I will call it "emu-native". One argument is that in Nix's view the hash is the same as a native build.)The existing
boot.binfmt.emulatedSystemsoption in NixOS introduces an unfortunate impurity into cross builds: All Nix builds gain the ability to run programs for these extra platforms. This is not good for cross compilation since it allows misconfigured builds that run programs for the host platform to work.Let's say someone regularly works on an aarch64-linux machine and also builds things for a riscv64-linux machine. They may have set up
boot.binfmt.emulatedSystems = [ "riscv64-linux" ];for testing software, and running emu-native builds. However, on this machine all the actually cross to riscv64-linux builds are now possibly "contaminated" - misconfigured cross builds that run riscv64-linux programs silently succeed and fail to reproduce on systems with no such emulator configured.NixOS/nixpkgs#354533 does not help, since that only matters for derivations where
systemis, for our example,riscv64-linux, which isn't the case for cross.An example where this impurity has tricked someone into thinking their cross compilation setup works is: NixOS/nixpkgs#447041
Therefore, add a new
binfmt-miscsetting, gated behind the experimental feature with the same name. Where the derivation platform matches one of the keys ofbinfmt-misc, the build sandbox is run with its own binfmt_misc instance, isolated from the outside. This allows both native builds to run without binfmt_misc interpreters as an impurity, and also allows emu-native builds to use emulators without having to set them up globally. Even an unprivileged Nix can use binfmt-misc for emu-native builds.Context
This feature also relates to #1916.
This is only possible since Linux 6.7 (more precisely, torvalds/linux@21ca59b), where it is made possible for each user namespace to have its own binfmt_misc "instance". You can think of this as allowing containers to configure binfmt_misc for itself, without affecting the "host".
I will admit here that I have used some hedious user namespace trickery to handle the
/procandbinfmt_miscfilesystems' permissions involved, but I have this working for:uid-rangeand withoutAnd AFAICT, even besides Linux's "don't break userspace" policy, this is working on Linux 6.7+ as intended. There's a lot of comments in the code, esp in
src/libstore/unix/build/linux-derivation-builder.ccthat will hopefully explain what was done in much more detail.Ideally, this shouldn't change any functionality if the setting
binfmt-miscis not configured.Known TODOs and bikeshedding points
If this is idea approved, an experimental feature milestone should be added and filled in.Xp::BinfmtMiscremovedAdd 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.