Skip to content

Fix: unable to send termination message to remote monitor due to incorrect creation time check#248

Closed
qjpcpu wants to merge 2 commits intoergo-services:masterfrom
qjpcpu:fix/MessageDownPID
Closed

Fix: unable to send termination message to remote monitor due to incorrect creation time check#248
qjpcpu wants to merge 2 commits intoergo-services:masterfrom
qjpcpu:fix/MessageDownPID

Conversation

@qjpcpu
Copy link

@qjpcpu qjpcpu commented Jan 29, 2026

  • Summary
    This PR fixes a bug where gen.MessageDownPID and gen.MessageDownAlias were not being received by monitoring processes on remote nodes. The issue caused the monitoring process to hang or fail to react when a watched remote process terminated.

  • Root Cause Analysis
    In net/proto/connection.go, the methods SendTerminatePID and SendTerminateAlias contained an incorrect validation logic:

    // Old code in SendTerminatePID
    if target.Creation != c.peer_creation {
    return gen.ErrProcessIncarnation
    }

    • target.Creation: Represents the creation timestamp of the terminated process (or alias). Since the process resides on the local node (the sender of this message), this value is derived from the local node's creation time.
    • c.peer_creation: Represents the creation timestamp of the remote node (the receiver/peer).

    The code incorrectly enforced that the local process's creation time must match the remote peer's creation time. In a real-world distributed environment, two different nodes will almost never have the exact same creation timestamp. Consequently, this check fails,
    gen.ErrProcessIncarnation is returned, and the termination signal is silently dropped before it can be sent over the network.

  • The Fix
    The incorrect if check has been removed from both SendTerminatePID and SendTerminateAlias.

• These methods are designed to notify the peer that a resource on the sender's side has terminated.
• There is no logical requirement for the local resource's creation time to match the remote peer's start time.

This change ensures that protoMessageTerminatePID and protoMessageTerminateAlias are correctly transmitted, allowing the remote node to handle the message and dispatch gen.MessageDownPID / gen.MessageDownAlias to the monitoring actors.

Affected Methods

• net/proto/connection.go: SendTerminatePID
• net/proto/connection.go: SendTerminateAlias

@CLAassistant
Copy link

CLAassistant commented Jan 29, 2026

CLA assistant check
All committers have signed the CLA.

@qjpcpu qjpcpu force-pushed the fix/MessageDownPID branch from d30fc87 to cdccc45 Compare January 29, 2026 02:19
@qjpcpu qjpcpu force-pushed the fix/MessageDownPID branch from cdccc45 to 215c27e Compare January 29, 2026 02:29
@halturin
Copy link
Collaborator

Thanks for the report.

Creation field in PID,Ref, etc reflects the node creation. It is used to track the incarnation, so your change breaks this feature.

May you please double check this issue using the latest commit in v320 branch? Target manager has been rewritten significantly, so the issue has likely already been solved.

@halturin
Copy link
Collaborator

@qjpcpu
Copy link
Author

qjpcpu commented Jan 29, 2026

Thanks for the report.

Creation field in PID,Ref, etc reflects the node creation. It is used to track the incarnation, so your change breaks this feature.

May you please double check this issue using the latest commit in v320 branch? Target manager has been rewritten significantly, so the issue has likely already been solved.

This bug still exists in version v320.

@qjpcpu
Copy link
Author

qjpcpu commented Jan 29, 2026

@halturin I've submitted the new code with proper node validation.can you help review it.

@halturin
Copy link
Collaborator

I've just dug into the logic and can confirm now that SendTeminatePID[Alias] should not have those checks, so I removed them in both methods. I also fixed the tests for links. changes are in the v320.

@qjpcpu
Copy link
Author

qjpcpu commented Jan 30, 2026

I've just dug into the logic and can confirm now that SendTeminatePID[Alias] should not have those checks, so I removed them in both methods. I also fixed the tests for links. changes are in the v320.

@halturin could you please merge those commits into master or v310 ? my project depends on it. Thanks

@halturin
Copy link
Collaborator

thanks for the great catch. its a critical issue. merged into the master this fix. anyway, the next release is scheduled for the next week (feb 4).

@halturin
Copy link
Collaborator

@qjpcpu
Copy link
Author

qjpcpu commented Jan 30, 2026

well, it's my pleasure

@qjpcpu qjpcpu closed this Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants