Some fixes staged for 1.5#133
Conversation
`--short` is now working correctly with TER
Since the previous commits also updated the default value of 'floor' smoothing to 0.1 from 0.0, the users using the compat API will already be experiencing a behavior change in their future sentence BLEU scores. This commit also prioritizes the behavioral integrity between the API and the CLI (#98) by changing the default smoothing method of 'sentence_bleu()' to 'exp' similar to CLI. Because of this difference, the scores between the API and the sacreBLEU utility with -sl option passed, were never comparable. The users that want to preserve comparability with their previously computed sentence BLEU scores, should therefore call this method with smooth_method='floor' and smooth_value=0.0 in the future.
Use GitHub links for IWSLT test/dev sets. Also, perform an MD5 check if dataset.py is launched from CLI i.e. 'python sacrebleu/dataset.py'
martinpopel
left a comment
There was a problem hiding this comment.
Great. Thanks. Just a minor question/suggestion on moving __repr__ to BaseScorer.
mjpost
left a comment
There was a problem hiding this comment.
LGTM, pending the small comments.
| ref_streams, | ||
| smooth_value=None) -> BLEUScore: | ||
| smooth_value=0) -> BLEUScore: | ||
| """Convenience function that wraps corpus_bleu(). |
There was a problem hiding this comment.
I think we should factor out the 0 here into a global constant, e.g., SMOOTH_VALUE_DEFAULT.
There was a problem hiding this comment.
The defaults for BLEU are already defined in BLEU.SMOOTH_DEFAULTS. The reason it is hard-coded to be 0 here was to preserve backward compatibility with the raw_corpus_bleu of older sacreBLEU versions. But since we're breaking API here, I can revert this to be None so that it defaults to the new internal default of 0.1 ?
There was a problem hiding this comment.
The alternative is to remove this "convenience" function completely and expect people to use corpus_bleu with arguments that they'd like to use.
There was a problem hiding this comment.
So raw_corpus_bleu hard-codes a smooth_value of floor. Since the API allows the value to be changed, I would suggest that instead of smooth_value=0, we use smooth_value=BLEU.SMOOTH_DEFAULTS["floor"]. That way the value is explicit, instead of potentially getting lost here. What do you think of this?
There was a problem hiding this comment.
I'd like to keep the convenience function. I think Sockeye might be using it.
There was a problem hiding this comment.
smooth_value=BLEU.SMOOTH_DEFAULTS["floor"] is equivalent (although more explicit) to the unmodified version of the signature which was smooth_value=None (a None fetches the default from that very same dictionary internally in Bleu class)
The question here is whether we want sacreBLEU >= 1.5 to give the same sentence bleu scores as sacreBLEU < 1.5 when using this function, or not. I think we don't care as of now since we already introduced other behavior differences as well.
So I'll make this BLEU.SMOOTH_DEFAULTS["floor"] to be more readable.
The reason why I suggested this function is that, it again falls back to floor although all other methods for sentence-bleu are now synchronized to use exp smoothing. It's kind of weird the choices here =)
|
I think I'm happy with merging this, subject to the few stray comments. Does one of you want to handle this, or do you prefer I do it? I think we should do a squash merge per our earlier conversation, but take care to create a tidy commit message. |
|
I don't know how to merge the other variable-bleu PR into this. If you want, we can merge this one first, and then rebase the other PR, add the Changelog entry and merge afterwards? |
|
Sure, that's fine. It doesn't particularly matter to me. |
Hello,
I always end up doing this in a sub-optimal way. This one includes stuff that should normally be created as different PRs probably. Maybe we can merge without squashing this time?