[WIP] libcontainer: skip EPERM from rootfsParentMountPrivate in userns#5242
[WIP] libcontainer: skip EPERM from rootfsParentMountPrivate in userns#5242lentil1016 wants to merge 1 commit intoopencontainers:mainfrom
Conversation
kolyshkin
left a comment
There was a problem hiding this comment.
Can we start by writing a test case for it (probably best in the form of bats test, see tests/integration)?
aa50b0a to
647ab84
Compare
55c4cdf to
df11f34
Compare
df11f34 to
f049b69
Compare
|
@kolyshkin Hi, I've added a bats integration test for this fix. Any other change I can perform? |
| # In a user namespace, mounts inherited from a more privileged mount namespace | ||
| # are locked and cannot have their propagation changed to MS_PRIVATE (EPERM). | ||
| # rootfsParentMountPrivate should skip EPERM in this case. |
There was a problem hiding this comment.
This is only on old kernels, right? Can you clarify that? Do you know when this was changed in Linux upstream?
Can you clarify this in the commit msg too?
There was a problem hiding this comment.
I've updated the commit message to clarify this.
| // sufficient for pivot_root() to succeed and prevents mount events | ||
| // from leaking to the parent namespace. |
There was a problem hiding this comment.
Would it be simple to verify this in the test?
f049b69 to
f51549d
Compare
| update_config '.process.args = ["true"]' | ||
|
|
||
| # Record host mounts under the shared dir before container start. | ||
| host_mounts_before=$(cat /proc/self/mountinfo | grep "$shared_dir" | wc -l) |
There was a problem hiding this comment.
nit:
| host_mounts_before=$(cat /proc/self/mountinfo | grep "$shared_dir" | wc -l) | |
| host_mounts_before=$(grep -c "shared_dir" /proc/self/mountinfo) |
same below.
There was a problem hiding this comment.
grep -c returns exit code 1 when there are zero matches, which could fail the test under bats' set -e.
- host_mounts_before=$(grep -c "$shared_dir" /proc/self/mountinfo)
+ host_mounts_before=$(grep -c "$shared_dir" /proc/self/mountinfo || true)fixed with || true on both lines.
There was a problem hiding this comment.
BTW I tried to use the test case (without the fix) in #5244 and was unable to reproduce the issue.
181fc27 to
8eb171a
Compare
8eb171a to
e7f5d72
Compare
e7f5d72 to
33fa3d9
Compare
|
After re-evaluating the issue in my local environment, this looks more like a false positive. Sorry for the noise. Issue closed. |
8fa66ea to
b298661
Compare
In a user namespace, mounts inherited from a more privileged mount
namespace are locked by the kernel. Attempting to change their
propagation to MS_PRIVATE returns EPERM. This is safe to ignore
because prepareRoot() has already set MS_SLAVE recursively, which
is sufficient for pivot_root() and prevents mount leaks.
This affects kernels before Linux 6.17, where commit cffd0441872e
("use uniform permission checks for all mount propagation changes",
CVE-2025-38498) reworked do_change_type() to use ns_capable()
instead of the stricter check that returned EPERM in user
namespaces. The fix is also backported to some enterprise kernels
(e.g. RHEL 9 5.14.0-570.46.1).
An integration test is added for the cross-userns exec scenario:
when two containers have separate user namespaces but share an IPC
namespace (the Kubernetes sandbox/workload pattern), runc exec
must handle the setns ordering correctly.
Fixes: opencontainers#5241
Signed-off-by: yksun <yksun@alauda.io>
b298661 to
8fc864a
Compare
In a user namespace, mounts inherited from a more privileged mount
namespace are locked by the kernel. Attempting to change their
propagation to MS_PRIVATE returns EPERM. This is safe to ignore
because prepareRoot() has already set MS_SLAVE recursively, which
is sufficient for pivot_root() and prevents mount leaks.
In order to fix #5241