Our implementation of paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, using PyTorch
Run it on VSCode:
Author:
- Github: khoitran2003
- Email: anhkhoi246813579@gmail.com
-
Make sure you have installed
Anaconda. If not yet, see the setup document here. -
Clone this repository:
git clone https://github.com/khoitran2003/ViT -
cdintovitand install dependencies package:pip install -r requirements_cpu.txtfor CPU orpip install -r requirements_cuda124.txtfor CUDA 124
Create 2 folders train and validation in the data folder (which was created already). Then Please copy your images with the corresponding names into these folders.
trainfolder was used for the training processvalidationfolder was used for validating training result after each epoch
Structure of these folders.
sample_data/
...train/
......class_a/
.........a_image_1.jpg
.........a_image_2.jpg
......class_b/
.........b_image_1.jpg
.........b_image_2.jpg
...val/
......class_a/
.........a_image_3.jpg
.........a_image_4.jpg
......class_b/
.........b_image_3.jpg
.........b_image_4.jpg
We create train.py for training model.
usage: train.py [-h] [--model MODEL] [--num-classes CLASSES]
[--patch-size PATH_SIZE] [--lr LEARNING_RATE] [--weight-decay WEIGHT_DECAY]
[--batch-size BATCH_SIZE] [--epochs EPOCHS]
[--image-size IMAGE_SIZE]
[--train-folder TRAIN_FOLDER] [--valid-folder VALID_FOLDER]
[--model-folder MODEL_FOLDER]
optional arguments:
-h, --help
show this help message and exit
--model MODEL
Type of ViT model, valid option: base, large, huge
--num-classes CLASSES
Number of classes
--patch-size PATH_SIZE
Size of image patch
--lr LR
Learning rate
--batch-size BATCH_SIZE
Batch size
--epochs EPOCHS
Number of training epoch
--image-size IMAGE_SIZE
Size of input image
--train-folder TRAIN_FOLDER
Where training data is located
--valid-folder VALID_FOLDER
Where validation data is located
--model-folder MODEL_FOLDER
Folder to save trained model
There are some important arguments for the script you should consider when running it:
train-folder: The folder of training images. If you not specify this argument, the script will use the CIFAR-10 dataset for training.valid-folder: The folder of validation imagesnum-classes: The number of your problem classes.batch-size: The batch size of the datasetlr: The learning rate of Adam Optimizermodel-folder: Where the model after training savedmodel: The type of model you want to train. If you want to train withbaseorlargeorhugemodel, you need to specifypatch-sizeargument.
Example:
You want to train a model in 200 epochs with Oxford_pet dataset, 37 classes:
!python train.py --train-folder ${oxford_pet/train} --valid-folder ${oxford_pet/val} --num-classes 37 --patch-size 16 --image-size 224 --lr 0.0001 --epochs 200After training successfully, your model will be saved to model-folder defined before
We offer a script for testing a model using a new image via a command line:
python predict.py --model_path ${model_path} --image_path ${image_path} --data_path ${data_path} where test_image_path is the path of your test image, data_path is the root training data to find class name
Example:
python predict.py --model_path ./output/ViTBase_best.pth --image_path ./data/test/cat.2000.jpg --data_path ./oxford_pet/THANK YOU FOR WATCHING!
