Upgrade podman to fix critical container issues#1126
Merged
mattiaswal merged 16 commits intomainfrom Sep 2, 2025
Merged
Conversation
From the documentation: > 'podman image prune' removes all dangling images from local storage. > With the all option, all unused images are deleted (i.e., images not > in use by any container). > > The image prune command does not prune cache images that only use > layers that are necessary for other images. So, when the container script is called in the cleanup phase of the lifetime of a container, we can use the '--all' option to ensure we also remove this container's loaded image. In the case this happens before a reboot of the system, there will be no old version of the image loaded to /var/lib/containers after boot. Issue #1098 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
This was
linked to
issues
Sep 1, 2025
jovatn
approved these changes
Sep 1, 2025
Contributor
jovatn
left a comment
There was a problem hiding this comment.
Only checked README updates, and they look great!
Found a minor typo, that's all.
As Infix matures as an operating system it is quickly becoming more and more useful also for end-device use-cases. The README should reflect this change in focus. Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Highlights: - fixes to systemd and s6 type services - bare-bones libsystemd replacement with #include <systemd/sd-daemon.h> - new reload:script mimicking systemd ExecReload, and - new stop:script mimicking systemd ExecStop - exit status/signal info when a process dies - service kill:SEC now support up to 300 sec. - the /tmp/norespawn trick now also covers service_retry() - the sysv 'stop' command process environment is now same as 'start' - State machine ordering issue: enter new config generation after services disabled in previous generation have been stopped Full changelog at: - <https://github.com/troglobit/finit/releases/tag/4.13> - <https://github.com/troglobit/finit/releases/tag/4.14> Fixes #1123 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
This major upgrade, along with the upgrade to Finit v4.14, is what is needed to fix #1123, which was caused by some odd futex locking bug in Podman that left lingering issues in /var/lib/containers state files. The root cause as fixed already in v4.7.x, but since CNI is supported up to and including 4.9.5, going with a later release seemd prudent. Full changelogs at: - <https://github.com/containers/podman/releases/tag/v4.5.1> - <https://github.com/containers/podman/releases/tag/v4.6.0> - <https://github.com/containers/podman/releases/tag/v4.6.1> - <https://github.com/containers/podman/releases/tag/v4.6.2> - <https://github.com/containers/podman/releases/tag/v4.7.0> - <https://github.com/containers/podman/releases/tag/v4.7.1> - <https://github.com/containers/podman/releases/tag/v4.7.2> - <https://github.com/containers/podman/releases/tag/v4.8.0> - <https://github.com/containers/podman/releases/tag/v4.8.1> - <https://github.com/containers/podman/releases/tag/v4.8.2> - <https://github.com/containers/podman/releases/tag/v4.8.3> - <https://github.com/containers/podman/releases/tag/v4.9.0> - <https://github.com/containers/podman/releases/tag/v4.9.1> - <https://github.com/containers/podman/releases/tag/v4.9.2> - <https://github.com/containers/podman/releases/tag/v4.9.3> - <https://github.com/containers/podman/releases/tag/v4.9.4> - <https://github.com/containers/podman/releases/tag/v4.9.5> Fixes #1123 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
mattiaswal
requested changes
Sep 1, 2025
Contributor
|
Great work overall, you are the 🥇 |
mattiaswal
approved these changes
Sep 1, 2025
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
The extended kill delay (10 sec) is sometimes not enough for complex system containers. Also, podman sometimes take the opportunity to do housekeeping tasks when stopping a container. So, allow for up to 30 sec. grace period before we send SIGKILL. With the latest image prune extension, set a 60 sec. timeout for the cleanup task, in case podman gets stuck. This to prevent any future mishaps. Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
When a container's image is on an inaccessible remote server, the container wrapper script waits in the background for any netowrk changes to retry download of the image. This change avoids the dangerous previous construct, and is also easier to read: timeuot after 60 seconds unless ip monitor reads at least one event before that. Fixes #1124 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
- the port-mapping plugin supports iptables or nftables - the firewall plugin support only iptables or firewalld Enforce use of iptables wrapper for nftables, for now, in both plugins. This all needs to be refactored to run podman with "unmanaged" networks in the future. Related to issue #1125 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
- Drop redundant comments - Drop redundant imports - PEP-8 fixes Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Contributor
Author
|
Another minor change was added late to this PR, issue #1127, discussed with and approved by @mattiaswal |
d68230f to
46cd249
Compare
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Usually the CNI bridge plugin "takes care" of enabling IPv4 forwarding on all interfaces, see issue #1125, but when the container tests are run in a different order from the infix_containers.yaml, Infix may reset the IPv4 forwarding on this critical interface. This change is both future proof and also ensures the test works as it was intended even if tests are run out-of-order. Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Regression test for issue #1123 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
For a heavily loaded system, 10 seconds/retries is not enough time to expect containers to have started up. Particularly after the changes done recently to do prune before and after a container is started. Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Fixes #1127 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR addresses a set of container issues discovered while troubleshooting #1105, turns out that disabling or removing a container may under certain circumstances cause podman to deadlock and leave persistent file locks in
/var/lib/containers:ip monitorprocesses #1124The fix is to upgrade podman, to v4.9.5 (the last before they removed CNI support), and to also upgrade Finit, to v4.14, to allow all services to properly complete before starting the next "configuration generation". See the commit messages for more information.
A regression test,
container_enabled, has been added to ensure this particular issue never creeps back in. For improved test coverage, another test for verifying environment variables,container_environment, was also added.Note
Also included in this PR is an updated logo and slightly refreshed README that's worth checking out 😃
Checklist
Tick relevant boxes, this PR is-a or has-a: