libct: prepareCgroupFD: fall back to container init cgroup by kolyshkin · Pull Request #5101 · opencontainers/runc

kolyshkin · 2026-02-06T01:53:58Z

Previously, when prepareCgroupFD would not open container's cgroup
(as configured in config.json and saved to state.json), it returned
a fatal error, as we presumed a container can't exist without its own
cgroup.

Apparently, it can. In a case when container is configured without
cgroupns (i.e. it uses hosts cgroups), and /sys/fs/cgroup is mounted
read-write, a rootful container's init can move itself to an entirely
different cgroup (even a new one that it just created), and then the
original container cgroup is removed by the kernel (or systemd?) as
it has no processes left. By the way, from the systemd point of view
the container is gone. And yet it is still there, and users want
runc exec to work!

And it worked, thanks to the "let's try container init's cgroup"
fallback as added by commit c91fe9a ("cgroup2: exec: join the
cgroup of the init process on EBUSY"). The fallback was added for
the entirely different reason, but it happened to work in this very
case, too.

This behavior was broken with the introduction of CLONE_INTO_CGROUP
support.

While it is debatable whether this is a valid scenario when a container
moves itself into a different cgroup, this very setup is used by e.g.
buildkitd running in a privileged kubernetes container (see issue #5089).

To restore the way things are expected to work, add the same "try
container init's cgroup" fallback into prepareCgroupFD.

A test case (reproducing issue in #5089) is added. It fails before
the fix (see #5102) and succeeds here.

Fixes: #5089.

kolyshkin · 2026-02-06T01:54:42Z

TODO: add an integration test case.

kolyshkin · 2026-02-07T00:04:36Z

Added a test case. Testing it fails (w/o a fix) in #5102

rata

Thanks for tackling this! I'm fine with the general approach, just some nits about the code itself.

libcontainer/process_linux.go

rata

This mostly LGTM, thanks for so many iterations! :)

libcontainer/process_linux.go

lifubang · 2026-02-10T10:24:42Z

libcontainer/process_linux.go

+//
+//  2. A container init process with no cgroupns and /sys/fs/cgroup rw access
+//     may move itself to any other cgroup, and the original cgroup will disappear.
+func (p *setnsProcess) initProcessCgroupPath() string {


Hi, there’s a minor performance point — feel free to ignore if it’s not worth the change. The fallback path currently triggers two calls to initProcessCgroupPath. Would it make sense to add a small cache to avoid the duplicate initialization?
For example:

diff --git a/libcontainer/process_linux.go b/libcontainer/process_linux.go index 7926578d..38a14604 100644 --- a/libcontainer/process_linux.go +++ b/libcontainer/process_linux.go @@ -174,6 +174,7 @@ type setnsProcess struct { rootlessCgroups bool intelRdtPath string initProcessPid int + initCgroupPath string // cached container init's cgroup path } // tryResetCPUAffinity tries to reset the CPU affinity of the process @@ -300,6 +301,11 @@ func (p *setnsProcess) addIntoCgroupV1() error { // 2. A container init process with no cgroupns and /sys/fs/cgroup rw access // may move itself to any other cgroup, and the original cgroup will disappear. func (p *setnsProcess) initProcessCgroupPath() string { + // return cached value if available + if p.initCgroupPath != "" { + return p.initCgroupPath + } + if p.initProcessPid == 0 || !cgroups.IsCgroup2UnifiedMode() { return "" } @@ -313,7 +319,9 @@ func (p *setnsProcess) initProcessCgroupPath() string { return "" } - return fs2.UnifiedMountpoint + cgroup + // cache the result + p.initCgroupPath = fs2.UnifiedMountpoint + cgroup + return p.initCgroupPath } func (p *setnsProcess) addIntoCgroupV2() error {

I think the possibility of using initCgroupPath for the second time is low, as it needs a cgroup v2 host running kernel older than v5.7 (or clone3 being disabled via e.g. seccomp), which should be very rare.

rata

LGTM, thanks a lot for fixing this!

Left two very minor cosmetic things. I don't think another review is needed even if you change them :)

rata · 2026-02-11T09:45:13Z

libcontainer/process_linux.go


+	// Failed to join the configured cgroup. Fall back to container init's cgroup
+	// unless sub-cgroup is explicitly requested.
+	var path string


Super nit and feel free to ignore. But I don't really like the C89 style of declaring variables at the beginning (and since C99 already is not needed, almost 30 years ago :D). I think doing this before the if is more clear:

path := p.initProcessCgroupPath() if path == "" { ....

And I also think it is clearer to not assign inside the if, when the variable is not scoped to the if. IMHO the code is more readable like that.

But again, this is a complete nit and personal preference, feel free to ignore :)

@rata as to the first point (of not using var path string), alas this won't work because there's a goto above. Here's an error which I'm avoding:

if sub != "" { goto fail // ERROR: goto fail jumps over variable declaration at line 331 } path := p.initProcessCgroupPath() if path == "" { goto fail } ... fail: ...

Second point is valid but subjective. I see if a = func(); a != nil not as a way to scope a, but as a way to write more compact code. You see it as a way to scope a (while in fact it's not as we're not using := here).

I can see a value though; changed to your liking :)

rata · 2026-02-11T09:47:08Z

libcontainer/process_linux.go

+	if sub != "" {
+		goto fail
+	}
+	if cgroup = p.initProcessCgroupPath(); cgroup == "" {


And here too, if you like this pattern more (spaces probably broken, sorry :/):

Suggested change

if cgroup = p.initProcessCgroupPath(); cgroup == "" {

cgroup = p.initProcessCgroupPath();

if cgroup == "" {

Or just ignore too :)

Separate initProcessCgroupPath code out of addIntoCgroupV2. To be used by the next patch. While at it, describe the new scenario in which the container's configured cgroup might not be available. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

1. Refactor addIntoCgroupV2 in an attempt to simplify it. 2. Fix the bug of not trying the init cgroup fallback if rootlessCgroup is set. This is a bug because rootlessCgroup tells to ignore cgroup join errors, not to never try the fallback. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Previously, when prepareCgroupFD would not open container's cgroup (as configured in config.json and saved to state.json), it returned a fatal error, as we presumed a container can't exist without its own cgroup. Apparently, it can. In a case when container is configured without cgroupns (i.e. it uses hosts cgroups), and /sys/fs/cgroup is mounted read-write, a rootful container's init can move itself to an entirely different cgroup (even a new one that it just created), and then the original container cgroup is removed by the kernel (or systemd?) as it has no processes left. By the way, from the systemd point of view the container is gone. And yet it is still there, and users want runc exec to work! And it worked, thanks to the "let's try container init's cgroup" fallback as added by commit c91fe9a ("cgroup2: exec: join the cgroup of the init process on EBUSY"). The fallback was added for the entirely different reason, but it happened to work in this very case, too. This behavior was broken with the introduction of CLONE_INTO_CGROUP support. While it is debatable whether this is a valid scenario when a container moves itself into a different cgroup, this very setup is used by e.g. buildkitd running in a privileged kubernetes container (see issue 5089). To restore the way things are expected to work, add the same "try container init's cgroup" fallback into prepareCgroupFD. While at it, simplify the code flow. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Add a test case to reproduce runc issue 5089. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin · 2026-02-11T20:08:03Z

1.4 backport: #5117

kolyshkin added the backport/1.4-todo A PR in main branch which needs to backported to release-1.4 label Feb 6, 2026

kolyshkin mentioned this pull request Feb 6, 2026

Probe with exec fails because no cgroup directory is found (can't open cgroup: openat2 /sys/fs/cgroup/...: no such file or directory) #5089

Closed

kolyshkin marked this pull request as draft February 6, 2026 01:54

kolyshkin force-pushed the fix-exec branch 2 times, most recently from 1b7acc8 to 6397833 Compare February 7, 2026 00:03

kolyshkin force-pushed the fix-exec branch 2 times, most recently from c68dda7 to 1081573 Compare February 7, 2026 00:51

kolyshkin marked this pull request as ready for review February 7, 2026 01:09

kolyshkin mentioned this pull request Feb 7, 2026

[DNM] tests/int: add "runc exec [init changes cgroup]" #5102

Closed

kolyshkin requested review from AkihiroSuda and cyphar February 7, 2026 20:20

kolyshkin added this to the 1.4.1 milestone Feb 7, 2026

AkihiroSuda approved these changes Feb 9, 2026

View reviewed changes

rata reviewed Feb 9, 2026

View reviewed changes

libcontainer/process_linux.go Outdated Show resolved Hide resolved

kolyshkin force-pushed the fix-exec branch from 1081573 to d4aae0b Compare February 10, 2026 01:22

vvanouytsel mentioned this pull request Feb 10, 2026

Buildkit fails on AKSUbuntu-2204gen2containerd-202601.13.0 node pool version Azure/AKS#5583

Closed

rata reviewed Feb 10, 2026

View reviewed changes

libcontainer/process_linux.go Outdated Show resolved Hide resolved

libcontainer/process_linux.go Outdated Show resolved Hide resolved

libcontainer/process_linux.go Outdated Show resolved Hide resolved

lifubang reviewed Feb 10, 2026

View reviewed changes

kolyshkin force-pushed the fix-exec branch from d4aae0b to fb96d8a Compare February 10, 2026 21:01

rata approved these changes Feb 11, 2026

View reviewed changes

kolyshkin added 4 commits February 11, 2026 11:52

libct: factor out initProcessCgroupPath

94133fa

Separate initProcessCgroupPath code out of addIntoCgroupV2. To be used by the next patch. While at it, describe the new scenario in which the container's configured cgroup might not be available. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

tests/int: add "runc exec [init changes cgroup]"

1fdbab8

Add a test case to reproduce runc issue 5089. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin force-pushed the fix-exec branch from fb96d8a to 1fdbab8 Compare February 11, 2026 19:57

kolyshkin enabled auto-merge February 11, 2026 19:57

kolyshkin mentioned this pull request Feb 11, 2026

[1.4] libct: prepareCgroupFD: fall back to container init cgroup #5117

Merged

kolyshkin removed the backport/1.4-todo A PR in main branch which needs to backported to release-1.4 label Feb 11, 2026

kolyshkin added the backport/1.4-done A PR in main branch which has been backported to release-1.4 label Feb 11, 2026

kolyshkin merged commit f047c6b into opencontainers:main Feb 11, 2026
42 checks passed

	if cgroup = p.initProcessCgroupPath(); cgroup == "" {
	cgroup = p.initProcessCgroupPath();
	if cgroup == "" {

Conversation

kolyshkin commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kolyshkin commented Feb 6, 2026

Uh oh!

kolyshkin commented Feb 7, 2026

Uh oh!

rata left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rata left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lifubang Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

kolyshkin Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

rata left a comment

Choose a reason for hiding this comment

Uh oh!

rata Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

kolyshkin Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

rata Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

kolyshkin commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kolyshkin commented Feb 6, 2026 •

edited

Loading