Stop sshd before reboot (prevent reconnect/reboot race)#585
Merged
matusmarhefka merged 1 commit intomainfrom Apr 9, 2026
Merged
Stop sshd before reboot (prevent reconnect/reboot race)#585matusmarhefka merged 1 commit intomainfrom
sshd before reboot (prevent reconnect/reboot race)#585matusmarhefka merged 1 commit intomainfrom
Conversation
The ATEX disconnect method reliably waits for the test to be at a deterministic point, but that doesn't stop another race from happening *after* issuing 'reboot'. For example: 1) systemd kills all user sessions first (ahead of system daemons), also killing off the python-based test running over ssh 2) ATEX sees ssh disconnect, but that was expected since the control channel was already disconnected safely, so it "waits for reboot" by repeatedly attempting a reconnect 3) the reconnect succeeds, because sshd still wasn't shut down, despite user sessions being killed - maybe the OS is blocked for 2-3 minutes on a less important daemon shutting down before sshd 4) ATEX restarts the test, assuming the OS has rebooted This can be easily prevented by shutting off sshd and thus preventing new connections while keeping existing sessions alive. That ensures ATEX can never reconnect until something starts sshd again, which should happen only after the reboot. This race was reliably reproducible on ppc64le, perhaps due to some daemons shutting down very slowly. Signed-off-by: Jiri Jaburek <comps@nomail.dom>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The ATEX disconnect method reliably waits for the test to be at a deterministic point, but that doesn't stop another race from happening after issuing
reboot.For example:
systemdkills all user sessions first (ahead of system daemons), also killing off the python-based test running oversshsshdstill wasn't shut down, despite user sessions being killed - maybe the OS is blocked for 2-3 minutes on a less important daemon shutting down beforesshdThis can be easily prevented by shutting off
sshdand thus preventing new connections while keeping existing sessions alive. That ensures ATEX can never reconnect until something startssshdagain, which should happen only after the reboot.This race was reliably reproducible on ppc64le, perhaps due to some daemons shutting down very slowly.