Skip to content

worker: Fix orphaned containers on node shutdown#3779

Merged
victorges merged 7 commits intomasterfrom
vg/fix/node-shutdown
Oct 15, 2025
Merged

worker: Fix orphaned containers on node shutdown#3779
victorges merged 7 commits intomasterfrom
vg/fix/node-shutdown

Conversation

@victorges
Copy link
Contributor

@victorges victorges commented Oct 14, 2025

What does this pull request do? Explain your changes. (required)
This fixes the shutdown process to actually wait for the StartLivepeer function to exit,
while increasing the timeout of doing so to up to 10s. We are frequently leaving orphaned
containers behind because 2s is not enough to both stop+remove them on ai-worker logic.

This ensure we give enough time for the dockerRemoveContainer functions to run before
we exit the process.

Also, removed the RestartPolicy: always from the warm containers. This can also cause orphaned
containers in case of OS reboot, since docker will automatically restart the containers on reboot.
We already have a watchdog logic to make sure containers are healthy and restart them if not,
so the restart policy is redundant rn. Simplify it to just use up to 3 restarts on-failure for everyone.

Specific updates (required)

  • Actually wait for node logic to exit before quitting
  • Increase shutdown timeout
  • Fix restart policy of containers

How did you test each of these updates (required)
Kill O and make sure no containers are left behind

Does this pull request close any open issues?
Fixes #3776

Checklist:

@github-actions github-actions bot added the go Pull requests that update Go code label Oct 14, 2025
This will prevent them from being restart on system boot
@victorges victorges changed the title livepeer: Fix node shutdown logic livepeer: Fix orphaned containers on node shutdown Oct 14, 2025
@victorges victorges changed the title livepeer: Fix orphaned containers on node shutdown worker: Fix orphaned containers on node shutdown Oct 14, 2025
They should only be restart on failure as well, otherwise
our watchdog will take over.
@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 9.09091% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 31.69676%. Comparing base (1bc682a) to head (100a641).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
cmd/livepeer/livepeer.go 0.00000% 10 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #3779         +/-   ##
===================================================
+ Coverage   31.68993%   31.69676%   +0.00683%     
===================================================
  Files            158         158                 
  Lines          47564       47579         +15     
===================================================
+ Hits           15073       15081          +8     
- Misses         31603       31611          +8     
+ Partials         888         887          -1     
Files with missing lines Coverage Δ
ai/worker/docker.go 69.24399% <100.00000%> (+0.09321%) ⬆️
cmd/livepeer/livepeer.go 0.00000% <0.00000%> (ø)

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7dff847...100a641. Read the comment docs.

Files with missing lines Coverage Δ
ai/worker/docker.go 69.24399% <100.00000%> (+0.09321%) ⬆️
cmd/livepeer/livepeer.go 0.00000% <0.00000%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@victorges victorges force-pushed the vg/fix/node-shutdown branch from a282889 to 100a641 Compare October 14, 2025 18:38
@victorges victorges merged commit 37bb55c into master Oct 15, 2025
10 of 13 checks passed
@victorges victorges deleted the vg/fix/node-shutdown branch October 15, 2025 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add automatic cleanup of warm AI containers on process shutdown

2 participants