What you get is what you see: A visual markup decompiler

https://arxiv.org/abs/1609.04938

# 1. Abstract
- this model is end-to-end
- model uses convolutional network and recurrent network
- current models achieve 25% accuracy, but paper model achieves 75% accuracy

# 2. Introduction
- OCR requires joint processing of image and text data
- **WYGIWYS** is simple extension of the attention-based encoder-decoder model
- Paper introduces [IM2LATEX-100k Dataset](https://zenodo.org/record/56198)

# 3. Problem: image-to-markup generation
- author defined the image-to-markup problem as converting a rendered source image t o target presentational markup

# 4. Model
![image](https://user-images.githubusercontent.com/2807595/42231624-fffce59c-7f26-11e8-9052-8d85dcb55007.png)

## Convolutional Network
- Convolutional network does not uses fully connected layer
  - this preserve locality of CNN features in order to use visual attention
## Row Encoder
- [Show, Attend and Tell](https://arxiv.org/abs/1502.03044) shows image feature grid can be directly fed into decoder
  - decoder contains significant relative sequential order information
  - so using rnn can be help in
    - left-to-right order can be easily learned by encoder
    - RNN can utilize the surrounding horizontal context to refine the hidden representation
## Decoder
- uses attention model (Bahdanau attention)
- uses beam search on test time

# 5. Dataset
## Tokenization
- character based models were not that good
## Optional: Normalization
- modified KaTeX due to produce normalized input data

# My Notes
- each github project has different loss functions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What you get is what you see: A visual markup decompiler #19

1. Abstract

2. Introduction

3. Problem: image-to-markup generation

4. Model

Convolutional Network

Row Encoder

Decoder

5. Dataset

Tokenization

Optional: Normalization

My Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

What you get is what you see: A visual markup decompiler #19

Description

1. Abstract

2. Introduction

3. Problem: image-to-markup generation

4. Model

Convolutional Network

Row Encoder

Decoder

5. Dataset

Tokenization

Optional: Normalization

My Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions