-
Notifications
You must be signed in to change notification settings - Fork 846
Description
Screen.Recording.2026-03-05.at.1.11.36.AM.mov
Investigation details
Root cause
An orphaned AppHost process from a previous session was still running and holding onto the DCP proxy ports (7390/5557) and the dashboard (port 17092). When a new AppHost was started via aspire start, the new DCP could not bind to those proxy ports since the old DCP still owned them.
This caused the dashboard and CLI to disagree on state because:
- Dashboard — connected to the old AppHost's DCP instance (from 12:24AM), which showed the apiservice as
FinishedwithexitCode: 0and astartcommand available. - CLI (
aspire describe) — connected to the new AppHost (started at 1:07AM), which showed apiservice asRunning / Unhealthy.
Why Unhealthy?
The health check (apiservice_https_/health_200_check) was configured to hit https://localhost:7390/health, but port 7390 was owned by the old DCP proxy which was no longer forwarding traffic. The health check timed out with TaskCanceledException.
How it happened
- An AppHost was started initially (PID 27434, 12:24AM). It owned the DCP, dashboard (port 17092), and proxy ports (7390/5557).
aspire stopwas called, followed byaspire startmultiple times during development.- The old AppHost process (PID 27434) and its DCP child processes (PIDs 27515, 27477, 27527) were not fully terminated by
aspire stop. They continued running and holding the proxy ports. - The new AppHost started successfully but its apiservice was assigned different internal ports (e.g., 53557/53558) while the proxy ports (7390/5557) remained bound to the old DCP.
aspire psonly showed the new AppHost — the old one was invisible to the CLI but still alive.
Evidence
ps auxshowed two sets of DCP processes: old (12:24AM) and new (1:07AM)- The old DCP run-controllers (PID 27515) and apiserver (PID 27477) were still running
- Port 7390 returned HTTP 404 (old code without the new endpoints) or hung entirely
- The direct internal port (53557) served requests correctly with the new code
- After manually terminating the orphaned processes and doing a clean restart, everything worked — dashboard and CLI agreed, proxy ports worked, health checks passed.
Why the old AppHost was invisible to aspire ps
AuxiliaryBackchannelMonitor discovers running AppHosts by scanning aux*.sock.* files in ~/.aspire/cli/backchannels/. At the time of the issue, the directory contained:
auxi.sock.897ab4262e3c5cd8.4058 ← only auxi socket, PID 4058 (new AppHost)
cli.sock.08656a70215440ecbee54720aa637697 ← old format, from 00:50
cli.sock.2ae0eede5c2a465792345d11168effe9 ← old format, from 01:15
... (7 more cli.sock files)
There was no auxi.sock file for the old AppHost (PID 27434, from 12:24AM), so aspire ps couldn't see it.
At the very start of the session, aspire ps returned [] — the old AppHost was already invisible before any agent interaction.
Process tree analysis
The process chain is:
CLI → AppHost (CliOrphanDetector watches CLI PID) → DCP (--monitor watches AppHost PID)
If the AppHost dies, DCP should die. If DCP dies, ports should be released. So if two sets of DCP processes were running, two AppHost processes were likely alive.
Possible theory: race condition in stop-then-start
The session log shows many aspire stop && aspire start cycles. One possibility:
aspire stopsends a stop signal to the AppHost via the backchannelaspire stopreturns "AppHost stopped successfully"aspire startlaunches a new AppHost immediately- But the old AppHost/DCP process tree hasn't fully exited yet — DCP may still be shutting down and holding ports
- The new DCP starts, can't bind the proxy ports (still held by old DCP mid-shutdown), and gets assigned different internal ports
- The old DCP eventually dies, but now the proxy port mapping is broken
This could be a race condition between stop completing and the process tree fully releasing resources. aspire stop may report success when the stop signal is acknowledged, but before the actual process teardown (AppHost → DCP → port release) finishes.
However, there could be other explanations — this needs further investigation to confirm.