Skip to content

Some fixes staged for 1.5#133

Merged
ozancaglayan merged 18 commits into
masterfrom
fixes-2021
Jan 18, 2021
Merged

Some fixes staged for 1.5#133
ozancaglayan merged 18 commits into
masterfrom
fixes-2021

Conversation

@ozancaglayan
Copy link
Copy Markdown
Collaborator

Hello,

I always end up doing this in a sub-optimal way. This one includes stuff that should normally be created as different PRs probably. Maybe we can merge without squashing this time?

`--short` is now working correctly with TER
Previously, the default settings of `floor`-smoothing was acting as if
there were no smoothing applied (#98). This commit changes the default to 0.1
as in the original paper (cited from the code) (#129).
Since the previous commits also updated the default value of 'floor'
smoothing to 0.1 from 0.0, the users using the compat API will already
be experiencing a behavior change in their future sentence BLEU scores.

This commit also prioritizes the behavioral integrity between the API and the
CLI (#98) by changing the default smoothing method of 'sentence_bleu()' to
'exp' similar to CLI. Because of this difference, the scores between the
API and the sacreBLEU utility with -sl option passed, were never
comparable.

The users that want to preserve comparability with their previously
computed sentence BLEU scores, should therefore call this method
with smooth_method='floor' and smooth_value=0.0 in the future.
Use GitHub links for IWSLT test/dev sets. Also, perform an MD5 check
if dataset.py is launched from CLI i.e. 'python sacrebleu/dataset.py'
Copy link
Copy Markdown
Collaborator

@martinpopel martinpopel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thanks. Just a minor question/suggestion on moving __repr__ to BaseScorer.

Comment thread sacrebleu/metrics/bleu.py
Copy link
Copy Markdown
Owner

@mjpost mjpost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending the small comments.

Comment thread CHANGELOG.md
Comment thread CHANGELOG.md
Comment thread sacrebleu/__init__.py
Comment thread sacrebleu/compat.py
ref_streams,
smooth_value=None) -> BLEUScore:
smooth_value=0) -> BLEUScore:
"""Convenience function that wraps corpus_bleu().
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should factor out the 0 here into a global constant, e.g., SMOOTH_VALUE_DEFAULT.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults for BLEU are already defined in BLEU.SMOOTH_DEFAULTS. The reason it is hard-coded to be 0 here was to preserve backward compatibility with the raw_corpus_bleu of older sacreBLEU versions. But since we're breaking API here, I can revert this to be None so that it defaults to the new internal default of 0.1 ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to remove this "convenience" function completely and expect people to use corpus_bleu with arguments that they'd like to use.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So raw_corpus_bleu hard-codes a smooth_value of floor. Since the API allows the value to be changed, I would suggest that instead of smooth_value=0, we use smooth_value=BLEU.SMOOTH_DEFAULTS["floor"]. That way the value is explicit, instead of potentially getting lost here. What do you think of this?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the convenience function. I think Sockeye might be using it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smooth_value=BLEU.SMOOTH_DEFAULTS["floor"] is equivalent (although more explicit) to the unmodified version of the signature which was smooth_value=None (a None fetches the default from that very same dictionary internally in Bleu class)

The question here is whether we want sacreBLEU >= 1.5 to give the same sentence bleu scores as sacreBLEU < 1.5 when using this function, or not. I think we don't care as of now since we already introduced other behavior differences as well.

So I'll make this BLEU.SMOOTH_DEFAULTS["floor"] to be more readable.

The reason why I suggested this function is that, it again falls back to floor although all other methods for sentence-bleu are now synchronized to use exp smoothing. It's kind of weird the choices here =)

Comment thread sacrebleu/metrics/bleu.py
@mjpost
Copy link
Copy Markdown
Owner

mjpost commented Jan 14, 2021

I think I'm happy with merging this, subject to the few stray comments. Does one of you want to handle this, or do you prefer I do it? I think we should do a squash merge per our earlier conversation, but take care to create a tidy commit message.

@ozancaglayan
Copy link
Copy Markdown
Collaborator Author

I don't know how to merge the other variable-bleu PR into this. If you want, we can merge this one first, and then rebase the other PR, add the Changelog entry and merge afterwards?

@mjpost
Copy link
Copy Markdown
Owner

mjpost commented Jan 15, 2021

Sure, that's fine. It doesn't particularly matter to me.

@ozancaglayan ozancaglayan merged commit 907e9fb into master Jan 18, 2021
@ozancaglayan ozancaglayan deleted the fixes-2021 branch February 18, 2021 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants