Skip to content

What you get is what you see: A visual markup decompiler #19

@flrngel

Description

@flrngel

https://arxiv.org/abs/1609.04938

1. Abstract

  • this model is end-to-end
  • model uses convolutional network and recurrent network
  • current models achieve 25% accuracy, but paper model achieves 75% accuracy

2. Introduction

  • OCR requires joint processing of image and text data
  • WYGIWYS is simple extension of the attention-based encoder-decoder model
  • Paper introduces IM2LATEX-100k Dataset

3. Problem: image-to-markup generation

  • author defined the image-to-markup problem as converting a rendered source image t o target presentational markup

4. Model

image

Convolutional Network

  • Convolutional network does not uses fully connected layer
    • this preserve locality of CNN features in order to use visual attention

Row Encoder

  • Show, Attend and Tell shows image feature grid can be directly fed into decoder
    • decoder contains significant relative sequential order information
    • so using rnn can be help in
      • left-to-right order can be easily learned by encoder
      • RNN can utilize the surrounding horizontal context to refine the hidden representation

Decoder

  • uses attention model (Bahdanau attention)
  • uses beam search on test time

5. Dataset

Tokenization

  • character based models were not that good

Optional: Normalization

  • modified KaTeX due to produce normalized input data

My Notes

  • each github project has different loss functions

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions