Stop failing remembered entity if supervisor strategy failed#7720
Stop failing remembered entity if supervisor strategy failed#7720Aaronontheweb merged 6 commits intoakkadotnet:devfrom
Conversation
|
|
||
| namespace Akka.Cluster.Sharding; | ||
|
|
||
| public class ShardSupervisionStrategy: OneForOneStrategy |
There was a problem hiding this comment.
This is the custom supervisor strategy, only for the Shard actor
| if(restart) | ||
| context.Self.Tell(new ExcessiveSupervisorRestartPassivation(child, WithinTimeRangeMilliseconds, MaxNumberOfRetries, cause)); |
There was a problem hiding this comment.
The ProcessFailure code is pretty much the same as the OneToOneStrategy with these additional lines, we send the Shard actor a warning message that this failing child is due for termination because it is thrashing.
There was a problem hiding this comment.
This is not quite right - we also need to handle scenarios where the SupervisorStrategy decided to issue a Stop directive. Reason being: if the actor failed in such a way that it has to be stopped, it's by definition an irrecoverable exception. Continuing to remember the entity after we get this type of signal back is net-destructive.
There was a problem hiding this comment.
Also: we need to make sure this only applies to entity actors, not to any other children of the Shard, such as the RE infrastructure itself. You can determine this by checking the actor paths.
There was a problem hiding this comment.
Also: we need to make sure this only applies to entity actors, not to any other children of the Shard, such as the RE infrastructure itself. You can determine this by checking the actor paths.
nevermind, this gets handled inside the Shard message handlers by the looks of things
|
|
||
| internal sealed record ExcessiveSupervisorRestartPassivation(IActorRef Child, int TimeWindowInMilliseconds, int MaxRestartCount, Exception LastCause) : IShardRegionCommand; |
There was a problem hiding this comment.
New internal local only message from supervisor strategy to Shard actor
|
A note here, while this design works in a very quiet system, it might still somehow fail on a very busy system. This scheme would not be as responsive as what the unit test shows if the |
Aaronontheweb
left a comment
There was a problem hiding this comment.
Needs some changes in the way the supervision strategy is implemented
| if(restart) | ||
| context.Self.Tell(new ExcessiveSupervisorRestartPassivation(child, WithinTimeRangeMilliseconds, MaxNumberOfRetries, cause)); |
There was a problem hiding this comment.
This is not quite right - we also need to handle scenarios where the SupervisorStrategy decided to issue a Stop directive. Reason being: if the actor failed in such a way that it has to be stopped, it's by definition an irrecoverable exception. Continuing to remember the entity after we get this type of signal back is net-destructive.
|
OK, all fixed |
| public object StopMessage { get; } | ||
| } | ||
|
|
||
| internal sealed record SupervisorStopDirectivePassivation(IActorRef Child, string Reason, Exception LastCause) : IShardRegionCommand; |
| { | ||
| if (restart && stats.RequestRestartPermission(MaxNumberOfRetries, WithinTimeRangeMilliseconds)) | ||
| RestartChild(child, cause, suspendFirst: false); | ||
| else |
…ufus/akka.net into akkadotnet#7629-fix-dying-RE-actor
Fixes #7629
Changes
SupervisorStrategysettings toShardSupervisionStrategy(accessible only via C# fluent API)Checklist
For significant changes, please ensure that the following have been completed (delete if not relevant):
remember-entitiesand actors who can't start up correctly #7629