The current implementation of BLEU in SacreBLEU doesn't allow a variable number of references (different segments having a different number of references). Some NLG datasets (e.g. E2E, WebNLG) do have a variable number of references and it would be great if SacreBLEU worked on them.
Is that a design decision, or would you be open to a PR that would enable this?