This repository contains the necessary data and code to run our NER experiments.
Before you start, do the following:
- Get the following data files
- MasakhaNER:
data/masakhaner/*/{train,dev,test}.txt - Finnish:
data/turku-fin-ner/{train,dev,test}.txt - Hindi:
data/hiner/collapsed/{train,dev,test}.json
- MasakhaNER:
- Put the ParaNames TSV files in a folder called
paranamesin the root of the repo- Tip: symlinks will work, too
- Run
bash setup.shwhich will set up a Conda environment and attempts to install DyNet- NOTE: Installing DyNet may require manual intervention
The main workhorse is full_experiment.sh which you run with
bash ./full_experiment.sh "${config_file_path}" "${language}" "${should_confirm}"
where
config_file_pathis a path to the configuration file for the experimentlanguageis the relevant language code for the experimental datashould_confirm: a boolean (yes/no) for interactively confirming commands.- if
yes, an interactivefzfmenu will be used to select tasks to run
- if
- African languages: MasakhaNER
- Finnish: Turku NLP corpus
- Hindi: HiNER