Skip to content

hopefully fixes DPTNet this time #383

Merged
popcornell merged 3 commits intomasterfrom
add_poswise_ff_dptnet
Dec 8, 2020
Merged

hopefully fixes DPTNet this time #383
popcornell merged 3 commits intomasterfrom
add_poswise_ff_dptnet

Conversation

@popcornell
Copy link
Collaborator

see #380


self.mha = MultiheadAttention(embed_dim, n_heads, dropout=dropout)
self.dropout = nn.Dropout(dropout)
self.pos_wise_ff = nn.Sequential(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you decide to remove it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I got confused it is not in DPTNet sorry I confused also you.
Also we can implement the whole with two permutations. This is not efficient

Copy link

@SungFeng-Huang SungFeng-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be correct right now.

out = self.mha(tomha, tomha, tomha)[0]
out = self.pos_wise_ff(out.permute(1, 0, 2)).transpose(1, -1)
x = self.dropout(out) + x
x = self.dropout(out.permute(1, 2, 0)) + x

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious whether the dropouts at line 61, 65, 66 shared the same self.dropout would cause any problem?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a wrapper above a function and has no state, so it doesn't case any problem, I asked myself the same question in the past 😉

    def forward(self, input: Tensor) -> Tensor:
        return F.dropout(input, self.p, self.training, self.inplace)

@popcornell
Copy link
Collaborator Author

Thank you very much guys, I thought that maybe one can use only two permute operations. But our normalizations actually expects batch, channel, seq so the current version is fine. Merging.

@popcornell popcornell merged commit d5916a8 into master Dec 8, 2020
@popcornell popcornell deleted the add_poswise_ff_dptnet branch December 8, 2020 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants