Hi experts, nice idea via cross-modal VAE.
But the theory about evidence lower bound is constrained to one latent variable z, not two separated variable.
Do you have any idea about this point? or some reference paper?
BTW, CADA-VAE is exchange sigma 1 and sigma 2 only.