Towards liberating vision Transformers from pre-training.
Official code for paper Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training
Authors: Haofei Zhang, Jiarui Duan, Mengqi Xue, Jie Song, Li Sun, Mingli Song
| Model | Method | CIFAR-10 | CIFAR-100 |
|---|---|---|---|
| CNNs | EfficientNet-B2 ResNet50 Agent-S Agent-B |
94.14 94.92 94.18 94.83 |
75.55 77.57 74.62 74.78 |
| ViTs | ViT-S ViT-S-SAM ViT-S-Sparse ViT-B ViT-B-SAM ViT-B-Sparse |
87.32 87.77 87.43 79.24 86.57 83.87 |
61.25 62.60 62.29 53.07 58.18 57.22 |
| Pre-trained ViTs | ViT-S ViT-B |
95.70 97.17 |
80.91 84.95 |
| Ours Joint | Agent-S ViT-S Agent-B ViT-B |
94.90 95.14 95.06 95.00 |
74.06 76.19 76.57 77.83 |
| Ours Shared | Agent-S ViT-S Agent-B ViT-B |
93.22 93.72 92.66 93.34 |
74.06 75.50 74.11 75.71 |
| Method | 5% images | 10% images | 50% images |
|---|---|---|---|
| ResNet50 Agent-B |
35.43 35.28 |
50.86 47.46 |
70.05 68.13 |
| ViT-B ViT-B-SAM ViT-B-Sparse |
16.60 16.67 10.39 |
28.11 28.66 28.92 |
63.40 64.37 66.01 |
| Ours-Joint Ours-Shared |
36.01 33.06 |
49.73 45.75 |
71.36 66.48 |
- CIFAR: download cifar dataset to folder
~/datasets/cifar(you may specify this in configuration files). - ImageNet: download ImageNet dataset to folder
~/datasets/ILSVRC2012and pre-process with this script. - We also support other datasets such as CUB200, Sketches, Stanford Cars, TinyImageNet.
Our code requires cv-lib-PyTorch. You should download this repo and checkout to tag bootstrapping_vits.
cv-lib-PyTorchis an open source repo currently maintained by me.
- torch>=1.10.2
- torchvision>=0.11.3
- tqdm
- timm
- tensorboard
- scipy
- PyYAML
- pandas
- numpy
In dir config, we provide some configurations for training, including CIFAR100 and ImageNet-10%.
The following script will start training agent-small from scratch on CIFAR100.
For training with SAM optimizer, the option
--workershould be set tosam_train_worker.
export PYTHONPATH=/path/to/cv-lib-PyTorch
export CUDA_VISIBLE_DEVICES=0,1
port=9872
python dist_engine.py \
--num-nodes 1 \
--rank 0 \
--master-url tcp://localhost:${port} \
--backend nccl \
--multiprocessing \
--file-name-cfg cls \
--cfg-filepath config/cifar100/cnn/agent-small.yaml \
--log-dir run/cifar100/cnn/agent-small \
--worker workerexport PYTHONPATH=/path/to/project/cv-lib-PyTorch
export CUDA_VISIBLE_DEVICES=0,1
port=9873
python dist_engine.py \
--num-nodes 1 \
--rank 0 \
--master-url tcp://localhost:${port} \
--backend nccl \
--multiprocessing \
--file-name-cfg joint \
--cfg-filepath config/cifar100/joint/agent-small-vit-small.yaml \
--log-dir run/cifar100/joint/agent-small-vit-small \
--use-amp \
--worker mutual_workerexport PYTHONPATH=/path/to/project/cv-lib-PyTorch
export CUDA_VISIBLE_DEVICES=0,1
port=9873
python dist_engine.py \
--num-nodes 1 \
--rank 0 \
--master-url tcp://localhost:${port} \
--backend nccl \
--multiprocessing \
--file-name-cfg shared \
--cfg-filepath config/cifar100/shared/agent-base-res_like-vit-base.yaml \
--log-dir run/cifar100/shared/agent-base-res_like-vit-base \
--use-amp \
--worker mutual_workerAfter training, the accuracy of the final epoch is reported instead of the best one.
If you found this work useful for your research, please cite our paper:
@article{zhang2021bootstrapping,
title={Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training},
author={Zhang, Haofei and Duan, Jiarui and Xue, Mengqi and Song, Jie and Sun, Li and Song, Mingli},
journal={arXiv preprint arXiv:2112.03552},
year={2021}
}
