Fix: unable to send termination message to remote monitor due to incorrect creation time check#248
Fix: unable to send termination message to remote monitor due to incorrect creation time check#248qjpcpu wants to merge 2 commits intoergo-services:masterfrom
Conversation
d30fc87 to
cdccc45
Compare
cdccc45 to
215c27e
Compare
|
Thanks for the report. Creation field in PID,Ref, etc reflects the node creation. It is used to track the incarnation, so your change breaks this feature. May you please double check this issue using the latest commit in v320 branch? Target manager has been rewritten significantly, so the issue has likely already been solved. |
This bug still exists in version v320. |
|
@halturin I've submitted the new code with proper node validation.can you help review it. |
|
I've just dug into the logic and can confirm now that SendTeminatePID[Alias] should not have those checks, so I removed them in both methods. I also fixed the tests for links. changes are in the v320. |
@halturin could you please merge those commits into master or v310 ? my project depends on it. Thanks |
|
thanks for the great catch. its a critical issue. merged into the master this fix. anyway, the next release is scheduled for the next week (feb 4). |
|
mentioned you in the changelog https://github.com/ergo-services/ergo/tree/v320?tab=readme-ov-file#v320-2026-02-04-tag-version-v1999320 if you dont mind |
|
well, it's my pleasure |
Summary
This PR fixes a bug where gen.MessageDownPID and gen.MessageDownAlias were not being received by monitoring processes on remote nodes. The issue caused the monitoring process to hang or fail to react when a watched remote process terminated.
Root Cause Analysis
In net/proto/connection.go, the methods SendTerminatePID and SendTerminateAlias contained an incorrect validation logic:
// Old code in SendTerminatePID
if target.Creation != c.peer_creation {
return gen.ErrProcessIncarnation
}
• target.Creation: Represents the creation timestamp of the terminated process (or alias). Since the process resides on the local node (the sender of this message), this value is derived from the local node's creation time.
• c.peer_creation: Represents the creation timestamp of the remote node (the receiver/peer).
The code incorrectly enforced that the local process's creation time must match the remote peer's creation time. In a real-world distributed environment, two different nodes will almost never have the exact same creation timestamp. Consequently, this check fails,
gen.ErrProcessIncarnation is returned, and the termination signal is silently dropped before it can be sent over the network.
The Fix
The incorrect if check has been removed from both SendTerminatePID and SendTerminateAlias.
• These methods are designed to notify the peer that a resource on the sender's side has terminated.
• There is no logical requirement for the local resource's creation time to match the remote peer's start time.
This change ensures that protoMessageTerminatePID and protoMessageTerminateAlias are correctly transmitted, allowing the remote node to handle the message and dispatch gen.MessageDownPID / gen.MessageDownAlias to the monitoring actors.
Affected Methods
• net/proto/connection.go: SendTerminatePID
• net/proto/connection.go: SendTerminateAlias