Skip to content

estuary-cdk: enforce deadline for graceful shutdown#3921

Merged
Alex-Bair merged 1 commit intomainfrom
bair/estuary-cdk-graceful-shutdown-deadline
Feb 24, 2026
Merged

estuary-cdk: enforce deadline for graceful shutdown#3921
Alex-Bair merged 1 commit intomainfrom
bair/estuary-cdk-graceful-shutdown-deadline

Conversation

@Alex-Bair
Copy link
Member

Description:

When the stopping.event is set for any reason (ex: a stream encountered an exception, it's been 24 hours since the capture restarted), it's possible for other streams to block the graceful shutdown indefinitely. Captures have stalled in these situations and required manual intervention to get unstuck.

This commit adds enforce_shutdown that triggers after stopping.event is set. It waits 30 minutes for the graceful shutdown to complete. If those 30 minutes elapse and the connector hasn't exited, the TaskGroup's running tasks are signaled to cancel. If the connector is still running 5 minutes after attempting to cancel the tasks, then the CDK forces an exit with os._exit(1) as a last resort.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Tested on a development stack. Confirmed that:

  • After 30 minutes, of waiting for a graceful shutdown, capture tasks are cancelled.
  • 5 minutes after cancelling capture tasks, the connector is forcibly exited.

When the `stopping.event` is set for any reason (ex: a stream
encountered an exception, it's been 24 hours since the capture restarted),
it's possible for other streams to block the graceful shutdown
indefinitely. Captures have stalled in these situations and required
manual intervention to get unstuck.

This commit adds `enforce_shutdown` that triggers after `stopping.event`
is set. It waits 30 minutes for the graceful shutdown to complete. If
those 30 minutes elapse and the connector hasn't exited, the
`TaskGroup`'s running tasks are signaled to cancel. If the connector is
_still_ running 5 minutes after attempting to cancel the tasks, then the
CDK forces an exit with `os._exit(1)` as a last resort.
Comment on lines +200 to +216
try:
async with asyncio.TaskGroup() as tg:
# Start enforce_shutdown after tg is available
enforce_shutdown_task = asyncio.create_task(enforce_shutdown(tg))

task = Task(
log.getChild("capture"),
ConnectorStatus(log, stopping),
"capture",
self.output,
stopping,
tg,
)
log.event.status("Capture started")
await capture(task)
except* TerminateTaskGroup:
pass # Expected when enforce_shutdown terminates the task group
Copy link
Member Author

@Alex-Bair Alex-Bair Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Terminating a task group isn't natively supported by the standard library, so I used a TerminateTaskGroup exception to trigger the task group to terminate itself. Usage of a TermianteTaskGroup exception to trigger task group termination was copied from the Python docs.

@Alex-Bair Alex-Bair marked this pull request as ready for review February 21, 2026 00:18
@Alex-Bair Alex-Bair requested a review from a team February 21, 2026 00:19
Copy link
Contributor

@nicolaslazo nicolaslazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall seeing any connectors that try to handle cancellation signals, but I still think the os._exit(1) call is a reasonable fallback measure. Looks good, thanks Alex

@Alex-Bair Alex-Bair merged commit 12a7bca into main Feb 24, 2026
113 of 125 checks passed
@Alex-Bair Alex-Bair deleted the bair/estuary-cdk-graceful-shutdown-deadline branch February 24, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants