pip install requirements.txt
- Download Bert-base-uncased pretrain weights from here, or see a list of Bert model weights download links here
- Download corresponding vocabulary here. Note that the downloaded tar also contains the tensorflow pretrained model weights, but we only need the file
vocab.txt - Put the pretrained model file, the config json downloaded from the first step, and the vocabulary to
models/pytorch-bert-uncaseddirectory. - Download imdb dataset here and put it to
data/imdb - Download bdek dataset(i.e. amazon reviews dataset) here and put it to
data/bdek
- run
sh train_script.shin shell- open this file and you'll see different commands for different tasks
- The start point of the program is
train.py - Files like
trainers.py, evaluators.py, model.py, dataset.py, etc., defines classes for the corresponding component of the program, and is imported totrain.pyby xx_factory at the bottom of each file. - Developer should add new classes to these files to implement new features instead of editting the existing ones.
- There are several command line args that effect which module to choose from the factories, see the code for details.