Skip to content

Add shutdown reason to exit code propagation#7680

Closed
Arkatufus wants to merge 7 commits intoakkadotnet:devfrom
Arkatufus:#7517-Add-shutdown-exit-code
Closed

Add shutdown reason to exit code propagation#7680
Arkatufus wants to merge 7 commits intoakkadotnet:devfrom
Arkatufus:#7517-Add-shutdown-exit-code

Conversation

@Arkatufus
Copy link
Contributor

@Arkatufus Arkatufus commented Jun 6, 2025

Fixes #7517

Changes

  • Add exit code output to ActorSystem WhenTerminated property and .Terminate() method task.
  • If the CoordinatedShutdown stack was run, its reason will be propagated to the WhenTerminated task.
  • If "exit-clr" is set, the exit code will be passed to the Environment.Exit() call.
  • Exit codes conforms to the Linux exit code convention

See: https://manpages.ubuntu.com/manpages/lunar/man3/sysexits.h.3head.html
See: https://manpages.ubuntu.com/manpages/noble/man3/EXIT_SUCCESS.3const.html

Possible Exit Codes

Exit code Description
0 ActorSystem shutdown cleanly due to .Terminate(), .Abort(), or normal cluster events
1 Unknown shutdown cause
3 [CLUSTER] ActorSystem terminated because it is downed
4 [CLUSTER] ActorSystem terminated because it failed to join the cluster

Copy link
Contributor Author

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-review

if (exitClr && !coord._runningClrHook)
{
Environment.Exit(0);
Environment.Exit(coord.ShutdownReason?.ExitCode ?? 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We propagate the exit code to the process exit code, if it is requested

else
{
return TaskEx.Completed;
Environment.Exit(Get(system).ShutdownReason?.ExitCode ?? 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We propagate the exit code to the process exit code, if it is requested

Comment on lines +668 to +673
_atomicRef = new AtomicReference<Task?>(new Task(() =>
{
// CoordinatedShutdown must be fetched in a lazy way because it might not be initialized yet.
var exitCode = CoordinatedShutdown.Get(system).ShutdownReason?.ExitCode ?? FinalExitCode;
_terminationPromise.TrySetResult(exitCode);
}));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TerminationCallbacks now uses a TaskCompletionSource for the final task. This is cleaner than relying on task continuation chain result.

};
}

public abstract int ExitCode { get; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each Reason implementation will need to declare their exit codes.

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback

Comment on lines +166 to +175
internal enum CommonExitCodes
{
Ok = 0,
UnknownReason = 1,
// Exit code 2 is reserved for Linux Bash for "Incorrect Usage"
ClusterDowned = 3,
ClusterJoinFailed = 4,
// Exit codes 64-78 is reserved by Linux sysexits.h
// Exit codes 126 and above is reserved by Linux shell
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated error code list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me

Comment on lines +189 to +194
ReasonMap = new Dictionary<int, string>
{
[(int)CommonExitCodes.Ok] = "ActorSystem shutdown successfully",
[(int)CommonExitCodes.UnknownReason] = "Unknown shutdown reason",
[(int)CommonExitCodes.ClusterDowned] = "Cluster node downed",
[(int)CommonExitCodes.ClusterJoinFailed] = "Cluster join unsuccessful",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated human readable code mapping

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but why are the multi-node tests exiting in under 8 minutes? Did we introduce an error here that causes those not to run reliably? Or could that have happened as a result of earlier binary updates we made?

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be redesigned in a fresh PR - concerns got mixed up here, namely trying to jam ActorSystem.Terminate() into the exit code infrastructure, and even after trying to clean that up the concerns are still really messy here.

IMHO - we should:

  1. Just add an ExitCode property to the Reason type
  2. Set the application's ExitCode to that value in the place where we do it currently.
  3. Have this value default to 0 when unset, preserving current behavior.
  4. Not touch the ActorSystem class at all.
  5. I wouldn't even worry about writing tests for this - not really necessary or practical IMHO.

public abstract Akka.Actor.IScheduler Scheduler { get; }
public abstract Akka.Serialization.Serialization Serialization { get; }
public abstract Akka.Actor.Settings Settings { get; }
public abstract Akka.Actor.CoordinatedShutdown.Reason ShutdownReason { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we need this property here any longer, no?

public async Task ActorSystem_shutdown_reason_must_be_Ok()
{
await Sys.Terminate();
Sys.ShutdownReason.ExitCode.Should().Be(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we want to keep the ActorSystem decoupled from this - totally a CoordinatedShutdown concern.


}

public static void RegisterReason(int code, string reason)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be necessary - passing the reason into the CoordinatedShutdown.Run method should be sufficient.

@Arkatufus Arkatufus deleted the #7517-Add-shutdown-exit-code branch June 20, 2025 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CoordinatedShutdown: return non-zero (error) exit code when ActorSystem is removed due to DOWNing

2 participants