(To be honest, I'm not used to "deep learning coding" (PyTorch, Huggingface, etc...), so this might be a silly question. Keep in mind I'm a beginner.)
The original paper said that context encoder and candidate encoder are trained separately.


However I found in your code that both transformers are called as self.bert().
https://github.com/chijames/Poly-Encoder/blob/master/encoder.py#L20-L27
Is it OK? I doubt these two encoders have different weights after training.
FYI: In the official implementation of BLINK(https://arxiv.org/pdf/1911.03814.pdf ) paper, they prepare different methods. https://github.com/facebookresearch/BLINK/blob/master/blink/biencoder/biencoder.py#L37-L48
(To be honest, I'm not used to "deep learning coding" (PyTorch, Huggingface, etc...), so this might be a silly question. Keep in mind I'm a beginner.)
The original paper said that context encoder and candidate encoder are trained separately.
However I found in your code that both transformers are called as
self.bert().https://github.com/chijames/Poly-Encoder/blob/master/encoder.py#L20-L27
Is it OK? I doubt these two encoders have different weights after training.
FYI: In the official implementation of BLINK(https://arxiv.org/pdf/1911.03814.pdf ) paper, they prepare different methods. https://github.com/facebookresearch/BLINK/blob/master/blink/biencoder/biencoder.py#L37-L48