Allow evaluation on concatenation of multiple test sets by martinpopel · Pull Request #42 · mjpost/sacrebleu

martinpopel · 2019-08-16T01:33:51Z

This PR is just about the last commit (2c0e788), the previous commits are part of #41 (and should be discussed there).

I used bash for the test to keep consistency with the existing test.sh and also because I can test few other things (e.g. #31) more easily than with pytest.
I think @cfedermann is considering to rewrite all the tests into Python.

ArgumentParser deletes all newlines are multiple spaces by default. The script is now called "sacrebleu", not "sacreBLEU" and most users should have it installed in PATH, so "./" is not needed.

@mjpost

… sets. The list was included twice in the "sacrebleu -h" output. The list is getting quite long, so when printed on single line it is not much readable anyway. I think a separate --list option is better (it prints 1 test set per line). However, if anyone (@mjpost) wants to print the list within sacreble -h, I suggest to use the ArgumentParser's epilog. I know that `choices=DATASETS.keys()` does also validation in addition to documentation, but this functionality is already included (again, with a nicer multi-line listing of available test sets).

`args.langpair is None or args.langpair not in DATASETS[args.test_set])` this was unnecessarily duplicated. Also, I think the new structure is more readable.

SacreBLEU is not versioned in sockeye_contrib anymore. While `sacrebleu` with no arguments still prints all the test set names, it prints also the error message `sacreBLEU: I need either a predefined test set (-t) or a list of references' and ends with non-zero exit code. Using `sacrebleu --list` seems a tiny bit more user friendly.

use e.g. "-t wmt16,wmt17,wmt18"

mjpost · 2019-09-01T10:55:26Z

This looks very nice. Can you merge on master so I can see the changes better here?

martinpopel · 2019-09-03T23:15:35Z

Anything else I should do?

I tried to separate each PR into several commits which are easier to review and understand their motivation based on the commit log. The later PRs are based on the earlier ones, so you can either merge the oldest PRs first or you can merge just the last PR (#44) which includes all the commits, if you consider everything OK.

sacrebleu.py

mjpost · 2019-09-04T01:13:48Z

Sorry, just returned from lots of traveling, trying to make my way through things. I'd prefer to do these one by one and remerge on master after each one, if you don't mind. Thanks very much for your patience! I will commit to looking thoroughly at one of these each night this week.

sacrebleu.py

mjpost · 2019-09-04T01:15:48Z

sacrebleu.py

+    # which do not currently support multiple references, so the example is hypothetical.
+    if args.test_set is None:
+        concat_ref_files = [args.refs]
    else:


This isn't right—we do have multiple references for some test sets, see wmt17/tworefs.

Is there anything to say here? I think this is the only blocker on this one.

You are right, I checked that -t wmt16/tworefs,wmt17/tworefs -l en-fi works as expected and deleted the misleading comment.
(I had thought that multiple refs can be only specified as multiple files on the command line or a single file with tabs, but now I see that it is supported also for the internal test sets.)

`-t wmt16/tworefs,wmt17/tworefs -l en-fi` works as expected.

martinpopel added 6 commits August 15, 2019 09:41

print the help description with proper formatting

40da744

ArgumentParser deletes all newlines are multiple spaces by default. The script is now called "sacrebleu", not "sacreBLEU" and most users should have it installed in PATH, so "./" is not needed.

fix code duplication, nicer formatting of test set listing

fc24380

prevent code duplication

8844d64

`args.langpair is None or args.langpair not in DATASETS[args.test_set])` this was unnecessarily duplicated. Also, I think the new structure is more readable.

allow evaluation on concatenation of multiple test sets

2c0e788

use e.g. "-t wmt16,wmt17,wmt18"

Merge branch 'master' into testset-concat

0fb5375

mjpost reviewed Sep 4, 2019

View reviewed changes

sacrebleu.py Show resolved Hide resolved

mjpost reviewed Sep 4, 2019

View reviewed changes

sacrebleu.py Show resolved Hide resolved

mjpost reviewed Sep 4, 2019

View reviewed changes

delete a misleading comment

2373620

`-t wmt16/tworefs,wmt17/tworefs -l en-fi` works as expected.

mjpost merged commit 6146c8e into mjpost:master Sep 4, 2019

martinpopel deleted the testset-concat branch September 4, 2019 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow evaluation on concatenation of multiple test sets#42

Allow evaluation on concatenation of multiple test sets#42
mjpost merged 8 commits intomjpost:masterfrom
martinpopel:testset-concat

martinpopel commented Aug 16, 2019

Uh oh!

mjpost commented Sep 1, 2019

Uh oh!

martinpopel commented Sep 3, 2019 •

edited

Loading

Uh oh!

Uh oh!

mjpost commented Sep 4, 2019

Uh oh!

Uh oh!

Uh oh!

mjpost Sep 4, 2019

Uh oh!

mjpost Sep 4, 2019

Uh oh!

martinpopel Sep 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martinpopel commented Aug 16, 2019

Uh oh!

mjpost commented Sep 1, 2019

Uh oh!

martinpopel commented Sep 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mjpost commented Sep 4, 2019

Uh oh!

Uh oh!

Uh oh!

mjpost Sep 4, 2019

Choose a reason for hiding this comment

Uh oh!

mjpost Sep 4, 2019

Choose a reason for hiding this comment

Uh oh!

martinpopel Sep 4, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

martinpopel commented Sep 3, 2019 •

edited

Loading