Skip to content

Care about children of supervised processes#33

Open
tecki wants to merge 6 commits intobruceg:masterfrom
tecki:orphanage
Open

Care about children of supervised processes#33
tecki wants to merge 6 commits intobruceg:masterfrom
tecki:orphanage

Conversation

@tecki
Copy link
Copy Markdown

@tecki tecki commented May 8, 2017

daemons that forked their own children always were a big problem for supervision, since when the daemon dies, the orphaned children are out of sight of the supervisor. This was especially a problem for classical Unix daemons that forked immediately. This is what fghack is for, but as the name suggests this is only a hack. And it does not address the problem of daemons that completely legally spawn children.

This used to be a problem based on the operating system itself, as unices had no method to prevent that, and apparently POSIX was silent on that problem.

Recently, both Linux and FreeBSD (including DragonFly) have introduced a solution to this problem: subprocess reapers. If a process dies, all its children are adopted by the subprocess reaper. supervise is the ideal candidate to be such a subprocess reaper.

This pull request adds a new flag file orphanage to service directories. It means that if the supervised daemon dies, supervise goes into a orphanage state, meaning it waits until all children have died (I am not aware of a simple way to kill all orphans). Only once that's done, the service is restarted (if so demanded).

This works well in combination with supervise creating a new process group, this way it is possible to send a signal to all children to terminate them.

I consider the new subprocess reaper functionality in Linux and FreeBSD a huge step forward in the field of process supervision, that it should be added to daemontools-encore, even if it cannot be supported on all platforms. On non-supported platforms, everything stays as-is, as the orphanage options is just an add-on.

tecki added 6 commits May 8, 2017 10:19
when the service directory contains a file "orphanage", we adopt all
children of the supervised process when it dies, and wait until they
all have died as well.
every hour the status was written again, reseting the down timer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant