Add shutdown reason to exit code propagation#7680
Add shutdown reason to exit code propagation#7680Arkatufus wants to merge 7 commits intoakkadotnet:devfrom
Conversation
| if (exitClr && !coord._runningClrHook) | ||
| { | ||
| Environment.Exit(0); | ||
| Environment.Exit(coord.ShutdownReason?.ExitCode ?? 0); |
There was a problem hiding this comment.
We propagate the exit code to the process exit code, if it is requested
| else | ||
| { | ||
| return TaskEx.Completed; | ||
| Environment.Exit(Get(system).ShutdownReason?.ExitCode ?? 0); |
There was a problem hiding this comment.
We propagate the exit code to the process exit code, if it is requested
| _atomicRef = new AtomicReference<Task?>(new Task(() => | ||
| { | ||
| // CoordinatedShutdown must be fetched in a lazy way because it might not be initialized yet. | ||
| var exitCode = CoordinatedShutdown.Get(system).ShutdownReason?.ExitCode ?? FinalExitCode; | ||
| _terminationPromise.TrySetResult(exitCode); | ||
| })); |
There was a problem hiding this comment.
TerminationCallbacks now uses a TaskCompletionSource for the final task. This is cleaner than relying on task continuation chain result.
| }; | ||
| } | ||
|
|
||
| public abstract int ExitCode { get; } |
There was a problem hiding this comment.
Each Reason implementation will need to declare their exit codes.
| internal enum CommonExitCodes | ||
| { | ||
| Ok = 0, | ||
| UnknownReason = 1, | ||
| // Exit code 2 is reserved for Linux Bash for "Incorrect Usage" | ||
| ClusterDowned = 3, | ||
| ClusterJoinFailed = 4, | ||
| // Exit codes 64-78 is reserved by Linux sysexits.h | ||
| // Exit codes 126 and above is reserved by Linux shell | ||
| } |
There was a problem hiding this comment.
Updated error code list
| ReasonMap = new Dictionary<int, string> | ||
| { | ||
| [(int)CommonExitCodes.Ok] = "ActorSystem shutdown successfully", | ||
| [(int)CommonExitCodes.UnknownReason] = "Unknown shutdown reason", | ||
| [(int)CommonExitCodes.ClusterDowned] = "Cluster node downed", | ||
| [(int)CommonExitCodes.ClusterJoinFailed] = "Cluster join unsuccessful", |
There was a problem hiding this comment.
Updated human readable code mapping
Aaronontheweb
left a comment
There was a problem hiding this comment.
LGTM, but why are the multi-node tests exiting in under 8 minutes? Did we introduce an error here that causes those not to run reliably? Or could that have happened as a result of earlier binary updates we made?
Aaronontheweb
left a comment
There was a problem hiding this comment.
I think this needs to be redesigned in a fresh PR - concerns got mixed up here, namely trying to jam ActorSystem.Terminate() into the exit code infrastructure, and even after trying to clean that up the concerns are still really messy here.
IMHO - we should:
- Just add an
ExitCodeproperty to theReasontype - Set the application's ExitCode to that value in the place where we do it currently.
- Have this value default to
0when unset, preserving current behavior. - Not touch the
ActorSystemclass at all. - I wouldn't even worry about writing tests for this - not really necessary or practical IMHO.
| public abstract Akka.Actor.IScheduler Scheduler { get; } | ||
| public abstract Akka.Serialization.Serialization Serialization { get; } | ||
| public abstract Akka.Actor.Settings Settings { get; } | ||
| public abstract Akka.Actor.CoordinatedShutdown.Reason ShutdownReason { get; } |
There was a problem hiding this comment.
I don't believe we need this property here any longer, no?
| public async Task ActorSystem_shutdown_reason_must_be_Ok() | ||
| { | ||
| await Sys.Terminate(); | ||
| Sys.ShutdownReason.ExitCode.Should().Be(0); |
There was a problem hiding this comment.
Yeah I think we want to keep the ActorSystem decoupled from this - totally a CoordinatedShutdown concern.
|
|
||
| } | ||
|
|
||
| public static void RegisterReason(int code, string reason) |
There was a problem hiding this comment.
This should not be necessary - passing the reason into the CoordinatedShutdown.Run method should be sufficient.
Fixes #7517
Changes
WhenTerminatedproperty and.Terminate()method task.CoordinatedShutdownstack was run, its reason will be propagated to theWhenTerminatedtask.Environment.Exit()call.See: https://manpages.ubuntu.com/manpages/lunar/man3/sysexits.h.3head.html
See: https://manpages.ubuntu.com/manpages/noble/man3/EXIT_SUCCESS.3const.html
Possible Exit Codes
ActorSystemshutdown cleanly due to.Terminate(),.Abort(), or normal cluster eventsActorSystemterminated because it is downedActorSystemterminated because it failed to join the cluster