Skip to content

[WIP] libcontainer: skip EPERM from rootfsParentMountPrivate in userns#5242

Closed
lentil1016 wants to merge 1 commit intoopencontainers:mainfrom
lentil1016:fix/userns-skip-EPERM
Closed

[WIP] libcontainer: skip EPERM from rootfsParentMountPrivate in userns#5242
lentil1016 wants to merge 1 commit intoopencontainers:mainfrom
lentil1016:fix/userns-skip-EPERM

Conversation

@lentil1016
Copy link
Copy Markdown

In a user namespace, mounts inherited from a more privileged mount
namespace are locked by the kernel. Attempting to change their
propagation to MS_PRIVATE returns EPERM. This is safe to ignore
because prepareRoot() has already set MS_SLAVE recursively, which
is sufficient for pivot_root() and prevents mount leaks.

In order to fix #5241

Copy link
Copy Markdown
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we start by writing a test case for it (probably best in the form of bats test, see tests/integration)?

@lentil1016 lentil1016 force-pushed the fix/userns-skip-EPERM branch from aa50b0a to 647ab84 Compare April 13, 2026 06:33
@lentil1016 lentil1016 changed the title libcontainer: skip EPERM from rootfsParentMountPrivate in userns [WIP] libcontainer: skip EPERM from rootfsParentMountPrivate in userns Apr 13, 2026
@lentil1016 lentil1016 force-pushed the fix/userns-skip-EPERM branch 5 times, most recently from 55c4cdf to df11f34 Compare April 13, 2026 08:16
@lentil1016 lentil1016 changed the title [WIP] libcontainer: skip EPERM from rootfsParentMountPrivate in userns libcontainer: skip EPERM from rootfsParentMountPrivate in userns Apr 13, 2026
@lentil1016 lentil1016 force-pushed the fix/userns-skip-EPERM branch from df11f34 to f049b69 Compare April 13, 2026 08:42
@lentil1016
Copy link
Copy Markdown
Author

@kolyshkin Hi, I've added a bats integration test for this fix. Any other change I can perform?

Copy link
Copy Markdown
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR

Comment thread tests/integration/userns.bats Outdated
Comment on lines +268 to +270
# In a user namespace, mounts inherited from a more privileged mount namespace
# are locked and cannot have their propagation changed to MS_PRIVATE (EPERM).
# rootfsParentMountPrivate should skip EPERM in this case.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only on old kernels, right? Can you clarify that? Do you know when this was changed in Linux upstream?

Can you clarify this in the commit msg too?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the commit message to clarify this.

Comment thread libcontainer/rootfs_linux.go Outdated
Comment on lines +1065 to +1066
// sufficient for pivot_root() to succeed and prevents mount events
// from leaking to the parent namespace.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simple to verify this in the test?

@lentil1016 lentil1016 changed the title libcontainer: skip EPERM from rootfsParentMountPrivate in userns [WIP] libcontainer: skip EPERM from rootfsParentMountPrivate in userns Apr 13, 2026
@lentil1016 lentil1016 force-pushed the fix/userns-skip-EPERM branch from f049b69 to f51549d Compare April 13, 2026 10:37
Comment thread tests/integration/userns.bats Outdated
update_config '.process.args = ["true"]'

# Record host mounts under the shared dir before container start.
host_mounts_before=$(cat /proc/self/mountinfo | grep "$shared_dir" | wc -l)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
host_mounts_before=$(cat /proc/self/mountinfo | grep "$shared_dir" | wc -l)
host_mounts_before=$(grep -c "shared_dir" /proc/self/mountinfo)

same below.

Copy link
Copy Markdown
Author

@lentil1016 lentil1016 Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grep -c returns exit code 1 when there are zero matches, which could fail the test under bats' set -e.

-  host_mounts_before=$(grep -c "$shared_dir" /proc/self/mountinfo)
+  host_mounts_before=$(grep -c "$shared_dir" /proc/self/mountinfo || true)

fixed with || true on both lines.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I tried to use the test case (without the fix) in #5244 and was unable to reproduce the issue.

@lentil1016 lentil1016 force-pushed the fix/userns-skip-EPERM branch 2 times, most recently from 181fc27 to 8eb171a Compare April 13, 2026 21:27
@lentil1016 lentil1016 changed the title [WIP] libcontainer: skip EPERM from rootfsParentMountPrivate in userns libcontainer: skip EPERM from rootfsParentMountPrivate in userns Apr 13, 2026
@lentil1016 lentil1016 force-pushed the fix/userns-skip-EPERM branch from 8eb171a to e7f5d72 Compare April 14, 2026 02:24
@lentil1016 lentil1016 changed the title libcontainer: skip EPERM from rootfsParentMountPrivate in userns [WIP] libcontainer: skip EPERM from rootfsParentMountPrivate in userns Apr 14, 2026
@lentil1016 lentil1016 force-pushed the fix/userns-skip-EPERM branch from e7f5d72 to 33fa3d9 Compare April 14, 2026 02:44
@lentil1016
Copy link
Copy Markdown
Author

lentil1016 commented Apr 14, 2026

After re-evaluating the issue in my local environment, this looks more like a false positive. Sorry for the noise. Issue closed.

@lentil1016 lentil1016 closed this Apr 14, 2026
@lentil1016 lentil1016 reopened this Apr 14, 2026
@lentil1016 lentil1016 force-pushed the fix/userns-skip-EPERM branch 3 times, most recently from 8fa66ea to b298661 Compare April 14, 2026 05:49
In a user namespace, mounts inherited from a more privileged mount
namespace are locked by the kernel. Attempting to change their
propagation to MS_PRIVATE returns EPERM. This is safe to ignore
because prepareRoot() has already set MS_SLAVE recursively, which
is sufficient for pivot_root() and prevents mount leaks.

This affects kernels before Linux 6.17, where commit cffd0441872e
("use uniform permission checks for all mount propagation changes",
CVE-2025-38498) reworked do_change_type() to use ns_capable()
instead of the stricter check that returned EPERM in user
namespaces. The fix is also backported to some enterprise kernels
(e.g. RHEL 9 5.14.0-570.46.1).

An integration test is added for the cross-userns exec scenario:
when two containers have separate user namespaces but share an IPC
namespace (the Kubernetes sandbox/workload pattern), runc exec
must handle the setns ordering correctly.

Fixes: opencontainers#5241
Signed-off-by: yksun <yksun@alauda.io>
@lentil1016 lentil1016 force-pushed the fix/userns-skip-EPERM branch from b298661 to 8fc864a Compare April 14, 2026 05:57
@lentil1016 lentil1016 closed this Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rootfsParentMountPrivate fails with EPERM in user namespace

3 participants