[Core] fix TaskLifecycleEvent.node_id populated with emitting node instead of just executor node#61478
Conversation
…stead of just executor node Signed-off-by: sampan <sampan@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request fixes a bug where TaskLifecycleEvent.node_id was incorrectly populated with the emitting node's ID instead of the executor node's ID. This addresses a security-relevant bug by preventing the unintentional leakage of the submitter's node ID, which could otherwise be used to map internal cluster topology. The fix involves removing a line that unconditionally sets the node_id. The accompanying test changes are thorough and validate the fix for different task statuses, ensuring node_id is correctly set to the executor's node when appropriate. I have one suggestion to improve the structure of the new tests for better clarity and maintainability.
| std::move(state_transition); | ||
| } | ||
|
|
||
| lifecycle_event_data.set_node_id(node_id_.Binary()); |
There was a problem hiding this comment.
The TaskLifecycleEvent.node_id field was being populated with the node ID of the worker emitting the event (node_id_) by default. In Ray, TaskLifecycleEvent is intended to track the lifecycle of a task, and its node_id field should represent the node where the task is executing (the executor). Populating it with the submitter's node ID by default is incorrect and leads to the leakage of the submitter's node ID to any consumer of these events (e.g., via the Export API or the event aggregator). This information could be used by an attacker to map the internal cluster topology or target specific nodes. The pull request correctly addresses this issue by removing the default assignment of the submitter's node ID to the lifecycle_event_data.node_id field.
Signed-off-by: sampan <sampan@anyscale.com>
Description
pr incorrectly set TaskLifecycleEvent.node_id to submitter node_id in all cases except when task_status=SUBMITTED_TO_WORKER (when task_status = SUBMITTED_TO_WORKER, the node_id gets overwritten to executor node_id). in this pr the bug is fixed and a test is added to verify the expected behaviour.
Related issues
fixes #61474