Fix issue 15806 ("aspire stop" is not cleaning up application containers)#16006
Conversation
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 16006Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 16006" |
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #15806 by changing aspire stop to request a graceful shutdown (instead of killing an entire process tree) so that DCP can perform proper cleanup, and by tightening process identification using start times to reduce PID reuse risks.
Changes:
- Introduces a shared
ProcessSignalerutility to request graceful shutdown (SIGTERM / console ctrl event) and perform scoped force-kill fallback. - Extends backchannel AppHost info with CLI start time and uses it from
aspire stopfor safer signaling/termination decisions. - Adjusts Windows detached process launch to create a new process group (to support console ctrl-event targeting) and unifies orphan detection process checks via the new helper.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Shared/ProcessSignaler.cs | New shared helper for graceful shutdown + force-kill with start-time verification. |
| src/Aspire.Hosting/Cli/CliOrphanDetector.cs | Switches orphan detection process checks to ProcessSignaler. |
| src/Aspire.Hosting/Backchannel/BackchannelDataTypes.cs | Adds CliStartedAt to AppHostInformation. |
| src/Aspire.Hosting/Backchannel/AuxiliaryBackchannelRpcTarget.cs | Populates CliStartedAt from env/config and returns it via backchannel. |
| src/Aspire.Hosting/Aspire.Hosting.csproj | Links the shared ProcessSignaler into Aspire.Hosting. |
| src/Aspire.Cli/Processes/DetachedProcessLauncher.Windows.cs | Adds CREATE_NEW_PROCESS_GROUP for detached process creation on Windows. |
| src/Aspire.Cli/Processes/DetachedProcessLauncher.cs | Minor control-flow formatting change. |
| src/Aspire.Cli/Commands/StopCommand.cs | Uses graceful stop signaling and force-kill fallback limited to AppHost + detached CLI. |
| src/Aspire.Cli/Aspire.Cli.csproj | Links the shared ProcessSignaler into Aspire.Cli. |
4b1f3dd to
331ec00
Compare
968f0b2 to
8ba95fd
Compare
mitchdenny
left a comment
There was a problem hiding this comment.
Review of process signaling and stop command changes.
| var si = new STARTUPINFOEX(); | ||
| si.cb = Marshal.SizeOf<STARTUPINFOEX>(); | ||
| si.dwFlags = StartfUseStdHandles; | ||
| si.dwFlags = StartfUseStdHandles | StartfUseShowWindow; |
There was a problem hiding this comment.
What does this do? Allow GUI apps to show up?
There was a problem hiding this comment.
Original code was setting CreateNoWindow flag; this was one of the changes that I made to make it possible to do a graceful shutdown by sending Ctrl-Break to the process console, but to be honest, I am not sure if it is strictly necessary vs keeping the old flag. We use this approach with window hiding in DCP to good effect, so there's that.
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
8ba95fd to
4525794
Compare
|
@karolz-ms can you add an end to end to test for this scenario. They all call stop right now but I guess we don't really assert anything after that. |
(to match the start time granularity used by the backchannel)
493583f to
77a07ea
Compare
|
@davidfowl I changed existing test to ensure containers are being stopped by |
|
The current check might be too simplistic. I need to figure out something better i.e. ensure that the specific test container is cleaned up |
|
🎬 CLI E2E Test Recordings — 58 recordings uploaded (commit View recordings
📹 Recordings uploaded automatically from CI run #24272608692 |
davidfowl
left a comment
There was a problem hiding this comment.
Looks good! One minor nit:
ProcessSignaler.ForceKill has a small TOCTOU window — the process can exit between TryGetRunningProcess (which checks HasExited) and the subsequent process.Kill() call. If that happens, Kill() throws InvalidOperationException, which bubbles up through the foreach loop in StopAppHostAsync and gets caught by the outer catch (Exception) — resulting in a "failed to stop" error message even though the process did stop. The process always ends up dead either way, so this is cosmetic, but wrapping the Kill() in a try-catch would clean it up.
|
/backport to release/13.2 |
|
Started backporting to |
|
@karolz-ms backporting to git am output$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch
Applying: Fix issue 15806
Using index info to reconstruct a base tree...
M src/Aspire.Cli/Aspire.Cli.csproj
M src/Aspire.Cli/Processes/DetachedProcessLauncher.Windows.cs
M src/Aspire.Cli/Program.cs
M src/Aspire.Hosting/Aspire.Hosting.csproj
M src/Aspire.Hosting/Backchannel/AuxiliaryBackchannelRpcTarget.cs
M src/Aspire.Hosting/Backchannel/BackchannelDataTypes.cs
Falling back to patching base and 3-way merge...
Auto-merging src/Aspire.Cli/Aspire.Cli.csproj
Auto-merging src/Aspire.Cli/Processes/DetachedProcessLauncher.Windows.cs
Auto-merging src/Aspire.Cli/Program.cs
CONFLICT (content): Merge conflict in src/Aspire.Cli/Program.cs
Auto-merging src/Aspire.Hosting/Aspire.Hosting.csproj
Auto-merging src/Aspire.Hosting/Backchannel/AuxiliaryBackchannelRpcTarget.cs
Auto-merging src/Aspire.Hosting/Backchannel/BackchannelDataTypes.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 Fix issue 15806
Error: The process '/usr/bin/git' failed with exit code 128 |
…ers) (#16006) * Fix issue 15806 * Change default process start time check tolerance to 1 second (to match the start time granularity used by the backchannel) * Address code review feedback from Mitch * Add ExecuteCommandUntilOutputAsync test helper * Ensure containers are cleaned up by "aspire stop" command * Improved check for container cleanup # Conflicts: # src/Aspire.Cli/Program.cs
…ication containers on Windows) (#16123) * Fix issue 15806 ("aspire stop" is not cleaning up application containers) (#16006) * Fix issue 15806 * Change default process start time check tolerance to 1 second (to match the start time granularity used by the backchannel) * Address code review feedback from Mitch * Add ExecuteCommandUntilOutputAsync test helper * Ensure containers are cleaned up by "aspire stop" command * Improved check for container cleanup # Conflicts: # src/Aspire.Cli/Program.cs * Less frantic polling for results during CLI tests Should fix #16076 * Fix formatting
Fixes #15806
The main issue was we were that the
stopcommand implementation was killing entire detached CLI process tree, including all DCP processes, which prevented DCP from doing proper cleanup. On Unix-like systems we are able to avoid that by detaching ourselves from the application host process tree, but on Windows manipulating the process tree like that is not possible. Fixed by implementing graceful shutdown request (SIGTERM on Unix-like systems, Ctrl+Break console event on Windows) and limiting the process killing to application host and detached CLI only.While doing this work I also unified some parts of process-related logic between the CLI and "CLI orphan detector", and enhanced the use of process start time to make sure we are not killing processes that happen to reuse CLI process ID.