Skip to content

Fix issue 15806 ("aspire stop" is not cleaning up application containers)#16006

Merged
davidfowl merged 6 commits into
mainfrom
dev/karolz/issue-15806
Apr 11, 2026
Merged

Fix issue 15806 ("aspire stop" is not cleaning up application containers)#16006
davidfowl merged 6 commits into
mainfrom
dev/karolz/issue-15806

Conversation

@karolz-ms
Copy link
Copy Markdown
Contributor

@karolz-ms karolz-ms commented Apr 9, 2026

Fixes #15806

The main issue was we were that the stop command implementation was killing entire detached CLI process tree, including all DCP processes, which prevented DCP from doing proper cleanup. On Unix-like systems we are able to avoid that by detaching ourselves from the application host process tree, but on Windows manipulating the process tree like that is not possible. Fixed by implementing graceful shutdown request (SIGTERM on Unix-like systems, Ctrl+Break console event on Windows) and limiting the process killing to application host and detached CLI only.

While doing this work I also unified some parts of process-related logic between the CLI and "CLI orphan detector", and enhanced the use of process start time to make sure we are not killing processes that happen to reuse CLI process ID.

Copilot AI review requested due to automatic review settings April 9, 2026 17:42
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 16006

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 16006"

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #15806 by changing aspire stop to request a graceful shutdown (instead of killing an entire process tree) so that DCP can perform proper cleanup, and by tightening process identification using start times to reduce PID reuse risks.

Changes:

  • Introduces a shared ProcessSignaler utility to request graceful shutdown (SIGTERM / console ctrl event) and perform scoped force-kill fallback.
  • Extends backchannel AppHost info with CLI start time and uses it from aspire stop for safer signaling/termination decisions.
  • Adjusts Windows detached process launch to create a new process group (to support console ctrl-event targeting) and unifies orphan detection process checks via the new helper.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/Shared/ProcessSignaler.cs New shared helper for graceful shutdown + force-kill with start-time verification.
src/Aspire.Hosting/Cli/CliOrphanDetector.cs Switches orphan detection process checks to ProcessSignaler.
src/Aspire.Hosting/Backchannel/BackchannelDataTypes.cs Adds CliStartedAt to AppHostInformation.
src/Aspire.Hosting/Backchannel/AuxiliaryBackchannelRpcTarget.cs Populates CliStartedAt from env/config and returns it via backchannel.
src/Aspire.Hosting/Aspire.Hosting.csproj Links the shared ProcessSignaler into Aspire.Hosting.
src/Aspire.Cli/Processes/DetachedProcessLauncher.Windows.cs Adds CREATE_NEW_PROCESS_GROUP for detached process creation on Windows.
src/Aspire.Cli/Processes/DetachedProcessLauncher.cs Minor control-flow formatting change.
src/Aspire.Cli/Commands/StopCommand.cs Uses graceful stop signaling and force-kill fallback limited to AppHost + detached CLI.
src/Aspire.Cli/Aspire.Cli.csproj Links the shared ProcessSignaler into Aspire.Cli.

Comment thread src/Shared/ProcessSignaler.cs
Comment thread src/Shared/ProcessSignaler.cs
Comment thread src/Shared/ProcessSignaler.cs Outdated
Comment thread src/Shared/ProcessSignaler.cs Outdated
@karolz-ms karolz-ms force-pushed the dev/karolz/issue-15806 branch from 4b1f3dd to 331ec00 Compare April 9, 2026 17:50
@karolz-ms karolz-ms force-pushed the dev/karolz/issue-15806 branch from 968f0b2 to 8ba95fd Compare April 10, 2026 00:49
Copy link
Copy Markdown
Member

@mitchdenny mitchdenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of process signaling and stop command changes.

Comment thread src/Shared/ProcessSignaler.cs Outdated
Comment thread src/Aspire.Cli/Commands/StopCommand.cs Outdated
Comment thread src/Aspire.Cli/Commands/StopCommand.cs Outdated
var si = new STARTUPINFOEX();
si.cb = Marshal.SizeOf<STARTUPINFOEX>();
si.dwFlags = StartfUseStdHandles;
si.dwFlags = StartfUseStdHandles | StartfUseShowWindow;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? Allow GUI apps to show up?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original code was setting CreateNoWindow flag; this was one of the changes that I made to make it possible to do a graceful shutdown by sending Ctrl-Break to the process console, but to be honest, I am not sure if it is strictly necessary vs keeping the old flag. We use this approach with window hiding in DCP to good effect, so there's that.

@github-actions
Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

@karolz-ms karolz-ms force-pushed the dev/karolz/issue-15806 branch from 8ba95fd to 4525794 Compare April 10, 2026 17:03
@davidfowl
Copy link
Copy Markdown
Contributor

@karolz-ms can you add an end to end to test for this scenario. They all call stop right now but I guess we don't really assert anything after that.

@karolz-ms karolz-ms force-pushed the dev/karolz/issue-15806 branch from 493583f to 77a07ea Compare April 10, 2026 23:59
@karolz-ms
Copy link
Copy Markdown
Contributor Author

karolz-ms commented Apr 11, 2026

@davidfowl I changed existing test to ensure containers are being stopped by aspire stop command. With this change the same test would have failed without the fix in this PR. Let me know if you want me to do more than this.

@karolz-ms
Copy link
Copy Markdown
Contributor Author

The current check might be too simplistic. I need to figure out something better i.e. ensure that the specific test container is cleaned up

@github-actions
Copy link
Copy Markdown
Contributor

🎬 CLI E2E Test Recordings — 58 recordings uploaded (commit 898d03d)

View recordings
Test Recording
AddPackageInteractiveWhileAppHostRunningDetached ▶️ View Recording
AddPackageWhileAppHostRunningDetached ▶️ View Recording
AgentCommands_AllHelpOutputs_AreCorrect ▶️ View Recording
AgentInitCommand_DefaultSelection_InstallsSkillOnly ▶️ View Recording
AgentInitCommand_MigratesDeprecatedConfig ▶️ View Recording
AllPublishMethodsBuildDockerImages ▶️ View Recording
AspireAddPackageVersionToDirectoryPackagesProps ▶️ View Recording
AspireUpdateRemovesAppHostPackageVersionFromDirectoryPackagesProps ▶️ View Recording
Banner_DisplayedOnFirstRun ▶️ View Recording
Banner_DisplayedWithExplicitFlag ▶️ View Recording
Banner_NotDisplayedWithNoLogoFlag ▶️ View Recording
CertificatesClean_RemovesCertificates ▶️ View Recording
CertificatesTrust_WithNoCert_CreatesAndTrustsCertificate ▶️ View Recording
CertificatesTrust_WithUntrustedCert_TrustsCertificate ▶️ View Recording
ConfigSetGet_CreatesNestedJsonFormat ▶️ View Recording
CreateAndRunAspireStarterProject ▶️ View Recording
CreateAndRunAspireStarterProjectWithBundle ▶️ View Recording
CreateAndRunEmptyAppHostProject ▶️ View Recording
CreateAndRunJavaEmptyAppHostProject ▶️ View Recording
CreateAndRunJsReactProject ▶️ View Recording
CreateAndRunPythonReactProject ▶️ View Recording
CreateAndRunTypeScriptEmptyAppHostProject ▶️ View Recording
CreateAndRunTypeScriptStarterProject ▶️ View Recording
CreateJavaAppHostWithViteApp ▶️ View Recording
CreateStartAndStopAspireProject ▶️ View Recording
CreateTypeScriptAppHostWithViteApp ▶️ View Recording
DashboardRunWithOtelTracesReturnsNoTraces ▶️ View Recording
DescribeCommandResolvesReplicaNames ▶️ View Recording
DescribeCommandShowsRunningResources ▶️ View Recording
DetachFormatJsonProducesValidJson ▶️ View Recording
DoctorCommand_DetectsDeprecatedAgentConfig ▶️ View Recording
DoctorCommand_WithSslCertDir_ShowsTrusted ▶️ View Recording
DoctorCommand_WithoutSslCertDir_ShowsPartiallyTrusted ▶️ View Recording
GlobalMigration_HandlesCommentsAndTrailingCommas ▶️ View Recording
GlobalMigration_HandlesMalformedLegacyJson ▶️ View Recording
GlobalMigration_PreservesAllValueTypes ▶️ View Recording
GlobalMigration_SkipsWhenNewConfigExists ▶️ View Recording
GlobalSettings_MigratedFromLegacyFormat ▶️ View Recording
InitTypeScriptAppHost_AugmentsExistingViteRepoAtRoot ▶️ View Recording
InvalidAppHostPathWithComments_IsHealedOnRun ▶️ View Recording
LegacySettingsMigration_AdjustsRelativeAppHostPath ▶️ View Recording
LogsCommandShowsResourceLogs ▶️ View Recording
PsCommandListsRunningAppHost ▶️ View Recording
PsFormatJsonOutputsOnlyJsonToStdout ▶️ View Recording
PublishWithDockerComposeServiceCallbackSucceeds ▶️ View Recording
RestoreGeneratesSdkFiles ▶️ View Recording
RestoreSupportsConfigOnlyHelperPackageAndCrossPackageTypes ▶️ View Recording
RunFromParentDirectory_UsesExistingConfigNearAppHost ▶️ View Recording
SecretCrudOnDotNetAppHost ▶️ View Recording
SecretCrudOnTypeScriptAppHost ▶️ View Recording
StagingChannel_ConfigureAndVerifySettings_ThenSwitchChannels ▶️ View Recording
StartAndWaitForTypeScriptSqlServerAppHostWithNativeAssets ▶️ View Recording
StopAllAppHostsFromAppHostDirectory ▶️ View Recording
StopAllAppHostsFromUnrelatedDirectory ▶️ View Recording
StopNonInteractiveMultipleAppHostsShowsError ▶️ View Recording
StopNonInteractiveSingleAppHost ▶️ View Recording
StopWithNoRunningAppHostExitsSuccessfully ▶️ View Recording
UnAwaitedChainsCompileWithAutoResolvePromises ▶️ View Recording

📹 Recordings uploaded automatically from CI run #24272608692

Copy link
Copy Markdown
Contributor

@davidfowl davidfowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! One minor nit:

ProcessSignaler.ForceKill has a small TOCTOU window — the process can exit between TryGetRunningProcess (which checks HasExited) and the subsequent process.Kill() call. If that happens, Kill() throws InvalidOperationException, which bubbles up through the foreach loop in StopAppHostAsync and gets caught by the outer catch (Exception) — resulting in a "failed to stop" error message even though the process did stop. The process always ends up dead either way, so this is cosmetic, but wrapping the Kill() in a try-catch would clean it up.

@davidfowl davidfowl merged commit 9f9ccbb into main Apr 11, 2026
269 checks passed
@karolz-ms
Copy link
Copy Markdown
Contributor Author

/backport to release/13.2

@github-actions
Copy link
Copy Markdown
Contributor

Started backporting to release/13.2 (link to workflow run)

@aspire-repo-bot
Copy link
Copy Markdown
Contributor

@karolz-ms backporting to release/13.2 failed, the patch most likely resulted in conflicts. Please backport manually!

git am output
$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Fix issue 15806
Using index info to reconstruct a base tree...
M	src/Aspire.Cli/Aspire.Cli.csproj
M	src/Aspire.Cli/Processes/DetachedProcessLauncher.Windows.cs
M	src/Aspire.Cli/Program.cs
M	src/Aspire.Hosting/Aspire.Hosting.csproj
M	src/Aspire.Hosting/Backchannel/AuxiliaryBackchannelRpcTarget.cs
M	src/Aspire.Hosting/Backchannel/BackchannelDataTypes.cs
Falling back to patching base and 3-way merge...
Auto-merging src/Aspire.Cli/Aspire.Cli.csproj
Auto-merging src/Aspire.Cli/Processes/DetachedProcessLauncher.Windows.cs
Auto-merging src/Aspire.Cli/Program.cs
CONFLICT (content): Merge conflict in src/Aspire.Cli/Program.cs
Auto-merging src/Aspire.Hosting/Aspire.Hosting.csproj
Auto-merging src/Aspire.Hosting/Backchannel/AuxiliaryBackchannelRpcTarget.cs
Auto-merging src/Aspire.Hosting/Backchannel/BackchannelDataTypes.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 Fix issue 15806
Error: The process '/usr/bin/git' failed with exit code 128

Link to workflow output

karolz-ms added a commit that referenced this pull request Apr 13, 2026
…ers) (#16006)

* Fix issue 15806

* Change default process start time check tolerance to 1 second

(to match the start time granularity used by the backchannel)

* Address code review feedback from Mitch

* Add ExecuteCommandUntilOutputAsync test helper

* Ensure containers are cleaned up by "aspire stop" command

* Improved check for container cleanup
# Conflicts:
#	src/Aspire.Cli/Program.cs
@joperezr joperezr added this to the 13.3 milestone Apr 14, 2026
joperezr pushed a commit that referenced this pull request Apr 14, 2026
…ication containers on Windows) (#16123)

* Fix issue 15806 ("aspire stop" is not cleaning up application containers) (#16006)

* Fix issue 15806

* Change default process start time check tolerance to 1 second

(to match the start time granularity used by the backchannel)

* Address code review feedback from Mitch

* Add ExecuteCommandUntilOutputAsync test helper

* Ensure containers are cleaned up by "aspire stop" command

* Improved check for container cleanup
# Conflicts:
#	src/Aspire.Cli/Program.cs

* Less frantic polling for results during CLI tests

Should fix #16076

* Fix formatting
@karolz-ms karolz-ms deleted the dev/karolz/issue-15806 branch April 21, 2026 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"aspire stop" is not gracefully shutting down the docker containers it starts.

5 participants