Conversation
asteroid/masknn/attention.py
Outdated
|
|
||
| self.mha = MultiheadAttention(embed_dim, n_heads, dropout=dropout) | ||
| self.dropout = nn.Dropout(dropout) | ||
| self.pos_wise_ff = nn.Sequential( |
There was a problem hiding this comment.
Why did you decide to remove it?
There was a problem hiding this comment.
Because I got confused it is not in DPTNet sorry I confused also you.
Also we can implement the whole with two permutations. This is not efficient
SungFeng-Huang
left a comment
There was a problem hiding this comment.
Seems to be correct right now.
| out = self.mha(tomha, tomha, tomha)[0] | ||
| out = self.pos_wise_ff(out.permute(1, 0, 2)).transpose(1, -1) | ||
| x = self.dropout(out) + x | ||
| x = self.dropout(out.permute(1, 2, 0)) + x |
There was a problem hiding this comment.
I'm curious whether the dropouts at line 61, 65, 66 shared the same self.dropout would cause any problem?
There was a problem hiding this comment.
It's a wrapper above a function and has no state, so it doesn't case any problem, I asked myself the same question in the past 😉
def forward(self, input: Tensor) -> Tensor:
return F.dropout(input, self.p, self.training, self.inplace)|
Thank you very much guys, I thought that maybe one can use only two permute operations. But our normalizations actually expects batch, channel, seq so the current version is fine. Merging. |
see #380