Hi, thanks for sharing the code.
I noticed that you used self.mlp to work on pairs of node representations to obtain the edge scores. Then this edge score is used to select the edges of the causal subgraph.

However, there are two confusion questions. (1) you mentioned M_ij is calculated by sigmoid(Z_i^T Z_j) rather than using a parametric network to gett the mask matrix. (2) As far as I am concerned, the parameters of this edge model self.mlp cannot be backpropagated during the training. In other words, its parameters are fixed. Can you please give me some more explanations so that I can understand better how this edge model works?