Allow evaluation on concatenation of multiple test sets#42
Allow evaluation on concatenation of multiple test sets#42mjpost merged 8 commits intomjpost:masterfrom
Conversation
ArgumentParser deletes all newlines are multiple spaces by default. The script is now called "sacrebleu", not "sacreBLEU" and most users should have it installed in PATH, so "./" is not needed.
… sets. The list was included twice in the "sacrebleu -h" output. The list is getting quite long, so when printed on single line it is not much readable anyway. I think a separate --list option is better (it prints 1 test set per line). However, if anyone (@mjpost) wants to print the list within sacreble -h, I suggest to use the ArgumentParser's epilog. I know that `choices=DATASETS.keys()` does also validation in addition to documentation, but this functionality is already included (again, with a nicer multi-line listing of available test sets).
`args.langpair is None or args.langpair not in DATASETS[args.test_set])` this was unnecessarily duplicated. Also, I think the new structure is more readable.
SacreBLEU is not versioned in sockeye_contrib anymore. While `sacrebleu` with no arguments still prints all the test set names, it prints also the error message `sacreBLEU: I need either a predefined test set (-t) or a list of references' and ends with non-zero exit code. Using `sacrebleu --list` seems a tiny bit more user friendly.
use e.g. "-t wmt16,wmt17,wmt18"
|
This looks very nice. Can you merge on master so I can see the changes better here? |
|
Anything else I should do? I tried to separate each PR into several commits which are easier to review and understand their motivation based on the commit log. The later PRs are based on the earlier ones, so you can either merge the oldest PRs first or you can merge just the last PR (#44) which includes all the commits, if you consider everything OK. |
|
Sorry, just returned from lots of traveling, trying to make my way through things. I'd prefer to do these one by one and remerge on master after each one, if you don't mind. Thanks very much for your patience! I will commit to looking thoroughly at one of these each night this week. |
| # which do not currently support multiple references, so the example is hypothetical. | ||
| if args.test_set is None: | ||
| concat_ref_files = [args.refs] | ||
| else: |
There was a problem hiding this comment.
This isn't right—we do have multiple references for some test sets, see wmt17/tworefs.
There was a problem hiding this comment.
Is there anything to say here? I think this is the only blocker on this one.
There was a problem hiding this comment.
You are right, I checked that -t wmt16/tworefs,wmt17/tworefs -l en-fi works as expected and deleted the misleading comment.
(I had thought that multiple refs can be only specified as multiple files on the command line or a single file with tabs, but now I see that it is supported also for the internal test sets.)
`-t wmt16/tworefs,wmt17/tworefs -l en-fi` works as expected.
This PR is just about the last commit (2c0e788), the previous commits are part of #41 (and should be discussed there).
I used bash for the test to keep consistency with the existing test.sh and also because I can test few other things (e.g. #31) more easily than with pytest.
I think @cfedermann is considering to rewrite all the tests into Python.