Non-Autoregressive Neural Machine Translation

https://arxiv.org/abs/1711.02281

# Abstract
Features
- Non-Autoregressive (means output selves doesn't have dependency)
- Parallel outputs

How
- Knowledge distillation
- Input token fertilities
- Policy Gradient

# 1. Introduction
Paper model uses CNN and SAN (Transformer) to avoid autoregressive

# 2. Background
## 2.1. Autoregressive Neural Machine Translation
- Transformer's masking is better than CNN
## 2.2. Non-Autoregressive decoding
Problems of beam-search
- suffers from diminishing returns with respect to beam size
- limits search parallelism

They made output length variable T as probabilistic variable

## 2.3. The multimodality problem
Multimodality problem is problem of "high multimodal distribution of target translation"

# 3. The non-autoregressive transformer

![image](https://user-images.githubusercontent.com/2807595/36584522-60584308-18bd-11e8-9352-29fa2f217430.png)

## 3.3. Modeling fertility to tackle the multimodality problem

Used IBM Model 2 to use fertilities.

**Definition of fertilities and it's benefit**
- Definition: number of input word has been copied
- Provides natural factorization that dramatically reduces mode space
- Allows decoder easier

## 3.4. Translation predictor and the decoding process
- Argmax decoding
- Average decoding
- Noisy parallel decoding

# 4. Training
~~I didn't like this section~~

![image](https://user-images.githubusercontent.com/2807595/36584801-a5cd3ab4-18be-11e8-9d1a-d7f86e51a895.png)

## 4.2. Fine-Tuning
Uses KL Divergence, RL, backpropagation

**Word-level knowledge distillation (Teacher)**
![image](https://user-images.githubusercontent.com/2807595/36584838-c078f45c-18be-11e8-94c7-d5cc76b751b1.png)

**External fertility inference model**
![image](https://user-images.githubusercontent.com/2807595/36584991-5a7ee340-18bf-11e8-8e1d-eefc2a80303a.png)

# Todo
- (3.4) Search about **average decoding** and **noisy parallel decoding**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-Autoregressive Neural Machine Translation #6

Abstract

1. Introduction

2. Background

2.1. Autoregressive Neural Machine Translation

2.2. Non-Autoregressive decoding

2.3. The multimodality problem

3. The non-autoregressive transformer

3.3. Modeling fertility to tackle the multimodality problem

3.4. Translation predictor and the decoding process

4. Training

4.2. Fine-Tuning

Todo

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Non-Autoregressive Neural Machine Translation #6

Description

Abstract

1. Introduction

2. Background

2.1. Autoregressive Neural Machine Translation

2.2. Non-Autoregressive decoding

2.3. The multimodality problem

3. The non-autoregressive transformer

3.3. Modeling fertility to tackle the multimodality problem

3.4. Translation predictor and the decoding process

4. Training

4.2. Fine-Tuning

Todo

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions