This dataset is introduced and described in Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation.
| AE Split | # AE Examples | # Ratings |
|---|---|---|
| Train | 9,090 | 9,090 |
| Dev | 2,734 | 4,446 |
| Test | 5,831 | 9,724 |
| Total | 17,655 | 23,260 |
| Split by system | # AE Examples | # Ratings |
|---|---|---|
| BiDAF dev predictions | 5622 | 7522 |
| XLNet dev predictions | 2448 | 7932 |
| Luke dev predictions | 2240 | 4590 |
| Total | 8,565 | 14,170 |
The BEM model from the paper, finetuned on this dataset, is available on tfhub.
This colab demonstrates how to use it.
@article{bulian-etal-2022-tomayto,
author = {Jannis Bulian and
Christian Buck and
Wojciech Gajewski and
Benjamin B{\"o}rschinger and
Tal Schuster},
title = {Tomayto, Tomahto. Beyond Token-level Answer Equivalence
for Question Answering Evaluation},
journal = {CoRR},
volume = {abs/2202.07654},
year = {2022},
ee = {http://arxiv.org/abs/2202.07654},
}
This is not an official Google product.
For help or issues, please submit a GitHub issue or contact the authors by email.