Handwritten Text Recognition with TensorFlow

Code and model weights for English handwritten text recognition model trained on IAM Handwriting Database. It is more or less a TensorFlow port of Joan Puigcerver's amazing work on HTR. This framework could also be used for building similar models using other datasets. Codes for 3 architectures - BLSTM, CRNN, and STN followed by CRNN - have been provided.

Guess it's Anigo Montoya now...

Requirements

ImageMagick - for image processing
imgtxtenh - for image processing
TensorFlow v1.12.0 - for deep learning
TF bindings for Baidu's WarpCTC - for faster (GPU-based) implementation of CTC loss function
OpenCV 3 - for image processing (not required for prediction)
(Optional) TF implementation of STN (Spatial Transformer Network) - required for CRNN-STN architecture

Steps for predicting on new images

A pre-trained model with CRNN architecture (5 Conv2D blocks followed by 5 bidirectional LSTM layers, hyperparameters and architecture are same as used here) has been provided. You can use the model to get predictions on new images by following the below steps:

Place the images of handwritten text in the samples folder

Download the model weights from here, extract, and place it under the experiments directory. Ensure that the below directory structure is followed:

├── experiments
│   ├── CRNN_h128
│   │   ├── best_model
│   │   ├── checkpoint
│   │   └── summary

Enter the mains directory and run:

python predict.py -c ../configs/config.json

Steps for training model from scratch

Download the IAM dataset (you'll need to register on the website) and keep the lines partition in the /data/IAM/ directory as shown below:

├── data
│   ├── IAM
│   │   ├── lines
│   │   │   ├── a01-000u-00.png
│   │   │   ├── a01-000u-01.png
│   │   │   ├── .
│   │   │   ├── .
│   │   │   ├── .
│   │   ├── lines.txt

If required, modify the /configs/config.json file to change model architecture , image height, etc.
From the data directory, run:
```
python process_images.py -c ../configs/config.json
```
This will pre-process the images (add borders, resize, remove skew, etc.) using imgtxtenh and ImageMagick's convert.
From the data directory, run:
```
python prepare_IAM.py -c ../configs/config.json
```
This will:
- process the ground-truth labels to remove spaces within words and collapse contractions
- read each image and create TFRecords files for train, validation and test sets using Aachen's partition
Start model training by running the below command from the mains directory:
```
python main.py -c ../configs/config.json
```

Results

The error rates, achieved by the pre-trained model, on IAM validation and test sets are shown below:

Set	CER (%)
Validation	4.83
Test	7.01

Notes

Please ensure the text is written in black on white background, similar to the images placed in the samples folder
During training phase, character error rate (CER) is calculated only after every 10 steps; otherwise, training is slowed down due to TensorFlow's ctc_beam_search_decoder
Option for bucketing images according to image width (to avoid extraneous image padding) has been provided and can be toggled using the config file
Keeping images with a large width range together in a batch might produce slightly lower accuracy due to padding. A workaround is to keep batch size as 1 during inference.

Citations

Laia: A deep learning toolkit for HTR

@misc{laia2016,
  author = {Joan Puigcerver and
            Daniel Martin-Albo and
            Mauricio Villegas},
  title = {Laia: A deep learning toolkit for HTR},
  year = {2016},
  publisher = {GitHub},
  note = {GitHub repository},
  howpublished = {\url{https://github.com/jpuigcerver/Laia}},
}

Joan Puigcerver. Are Multidimensional Recurrent Layers ReallyNecessary for Handwritten Text Recognition? Pattern Recognition and Human Language Technology Research Center, Universitat Politècnica de València, Valencia, Spain
U. Marti and H. Bunke. The IAM-database: An English Sentence Database for Off-line Handwriting Recognition. Int. Journal on Document Analysis and Recognition, Volume 5, pages 39 - 46, 2002.
TensorFlow implementation of Spatial Transformer Network
Mahmoud Gemy for providing the Tensorflow Project Template

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
base		base
configs		configs
data		data
data_loader		data_loader
mains		mains
models		models
samples		samples
trainers		trainers
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
sample_output.png		sample_output.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Handwritten Text Recognition with TensorFlow

Requirements

Steps for predicting on new images

Steps for training model from scratch

Results

Notes

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Handwritten Text Recognition with TensorFlow

Requirements

Steps for predicting on new images

Steps for training model from scratch

Results

Notes

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages