Skip to content
This repository was archived by the owner on Dec 27, 2023. It is now read-only.

google-research-datasets/answer-equivalence-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Answer Equivalence Dataset

This dataset is introduced and described in Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation.

Download the data

AE Split # AE Examples # Ratings
Train 9,090 9,090
Dev 2,734 4,446
Test 5,831 9,724
Total 17,655 23,260
Split by system # AE Examples # Ratings
BiDAF dev predictions 5622 7522
XLNet dev predictions 2448 7932
Luke dev predictions 2240 4590
Total 8,565 14,170

BERT Matching (BEM) model

The BEM model from the paper, finetuned on this dataset, is available on tfhub.

This colab demonstrates how to use it.

How to cite AE?

@article{bulian-etal-2022-tomayto,
  author    = {Jannis Bulian and
		Christian Buck  and
		Wojciech Gajewski and
		Benjamin B{\"o}rschinger and
		Tal Schuster},
  title     = {Tomayto, Tomahto. Beyond Token-level Answer Equivalence 
               for Question Answering Evaluation},
  journal   = {CoRR},
  volume    = {abs/2202.07654},
  year      = {2022},
  ee        = {http://arxiv.org/abs/2202.07654},
}

Disclaimer

This is not an official Google product.

Contact information

For help or issues, please submit a GitHub issue or contact the authors by email.

About

This dataset contains human judgements about answer equivalence. The data is based on SQuAD (Stanford Question Answering Dataset), and contains 9k human judgements of answer candidates generated by Albert on the SQuAD train set, and an additional 14k human judgements for answer candidates produced by BiDAF, Luke, and XLNet on the SQuAD dev set.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors