Neural Discrete Representation Learning

https://arxiv.org/abs/1711.00937

# Abstract
- paper proposes model(**VQ-VAE**) that learns "discrete representations"
- differs from VAEs
  - encode network outputs discrete (means not continuous)
  - prior learnt than static
  - circumvent issues of ***posterior collapse***
    - latent ignored by decoder (typically observed by other VAEs)

# 1. Introduction
- usefulness of generic representations in unsupervised fashion is lack
- model conserves the important features of the data in latent space while optimising for maximum likelihood
- paper concentrate on representations
- images can often be described concisely by language
- paper most of VAE with discrete latent representations uses parameterization of the posterior distribution of observation but this paper relies on **vector quantization**
- **posterior collapse** is that latents being ignored
- can span many dimensions in data space

Models feature
- simple and unsupervised
- use discrete latent, not suffer from **posterior collapse** and has no variance issue
- perform as well as continuous model
- coherent and high quality on a wide variety

# 2. Related work
- there are many alternatives for training discrete VAEs \[[23](https://arxiv.org/abs/1312.6114), [32](https://arxiv.org/abs/1401.4082)\]
- [Concrete distribution](https://arxiv.org/abs/1611.00712) and [Gumbel-softmax](https://arxiv.org/abs/1611.01144) makes variance high but unbiased in the end of training
- [Scalar quantization](https://arxiv.org/abs/1703.00395) compresses activations for lossy image compression before arithmetic encoding

# 3. VQ-VAE

![image](https://user-images.githubusercontent.com/2807595/45672531-637b9500-bb63-11e8-9070-dc6d0489f718.png)

Order
1. Encoder parameterises posterior distribution ***q(z|x)*** of discrete latent random variables ***z*** with data ***x***
2. posteriors and priors in VAEs are assumed normally distributed with diagonal covariance, which allows for Gaussian re-parameterization trick to be used \[[32](https://arxiv.org/abs/1401.4082), [23](https://arxiv.org/abs/1312.6114)\]
  - autoregressive prior and posterior models \[[14](https://arxiv.org/abs/1310.8499)\]
  - normalizing flows \[[10](https://arxiv.org/abs/1605.08803)\]
  - inverse autoregressive posteriors \[[22](https://arxiv.org/abs/1606.04934)\]

# 3.1. Discrete Latent variables
- ***K*** is the size of the discrete latent space
- ***D*** is dimensionality of each latent embedding vector ***e_i***
![image](https://user-images.githubusercontent.com/2807595/45672441-1d263600-bb63-11e8-93ec-1764263f7943.png)

# 3.2. Learning
- Loss
  - reconstruction loss
  - stop gradients
  - commitment loss
![image](https://user-images.githubusercontent.com/2807595/45672493-48a92080-bb63-11e8-9579-382250c3bc0d.png)

# 4. Experiments
![image](https://user-images.githubusercontent.com/2807595/45672542-6a0a0c80-bb63-11e8-9733-ed61875ca334.png)

# 5. Conclusion
- capable of modeling very long term dependencies through compressed discrete latent space
- VQ-VAEs capture important features

# My Comments
- word discrete is by embedding(***e_i***) in VQ-VAE
- training would be hard because we should consider hyperparameters in Loss function
- [tf.stop_gradient](https://www.tensorflow.org/api_docs/python/tf/stop_gradient)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neural Discrete Representation Learning #23

Abstract

1. Introduction

2. Related work

3. VQ-VAE

3.1. Discrete Latent variables

3.2. Learning

4. Experiments

5. Conclusion

My Comments

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Neural Discrete Representation Learning #23

Description

Abstract

1. Introduction

2. Related work

3. VQ-VAE

3.1. Discrete Latent variables

3.2. Learning

4. Experiments

5. Conclusion

My Comments

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions