Small fix for SacreBLEUScore and the mteval-v13a tokenizer #1778

RistoAle97 · 2023-05-12T16:17:29Z

The behaviour of the mteval-v13a tokenizer (the ones used by WMT ) is the same as the original implementation now.

What does this PR do?

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

📚 Documentation preview 📚: https://torchmetrics--1778.org.readthedocs.build/en/1778/

The behaviour is the same as the original implementation now

RistoAle97 · 2023-05-12T16:39:56Z

Some more insight on the effect of the change, I compared the hf version with the torchmetrics one (before and after applying my little change), here are the results on the newstest2013 en-es test set:

# translations is a sequence of generated spanish sentences, targets is a sequence of sequences of given spanish sentences
> hf_score = hf_scb.compute(predictions=translations, references=targets)
> hf_bleu_score = "{0:.5f}".format(hf_score["score"])
> print(f"HF implementation\nBLEU score: {hf_bleu_score}\npreds_len: {hf_score['sys_len']}\ntarget_len: {hf_score['ref_len']}")
... HF implementation
... BLEU score: 30.73243
... preds_len: 67090
... target_len: 70528

> scb_tm = SacreBLEUScore()  # current implementation
> tm_score = "{0:.5f}".format(scb_tm(translations, targets) * 100)
> print(f"Torchmetrics implementation\nBLEU score: {tm_score}\npreds_len: {scb_tm.preds_len}\ntarget_len: {scb_tm.target_len}")
... Torchmetrics implementation
... BLEU score: 30.67770
... preds_len: 67028.0
... target_len: 70468.0

> scb_tm_new = SacreBLEUScoreTest()  # my implementation with the applied change
> tm_score_new = "{0:.5f}".format(scb_tm_new(translations, targets) * 100)
> print(f"Torchmetrics implementation\nBLEU score: {tm_score_new}\npreds_len: {scb_tm_new.preds_len}\ntarget_len: {scb_tm_new.target_len}")
... Torchmetrics new version
... BLEU score: 30.73244
... preds_len: 67090.0
... target_len: 70528.0

I also compared the three versions on the WMT14 en-fr test set:

... Huggingface SacreBLEU: {'score': 33.17529924795069, 'counts': [48246, 29265, 19045, 12569], 'totals': [76495, 73492, 70489, 67487], 'precisions': [63.070788940453625, 39.82066075219071, 27.018400034047865, 18.624327648287817], 'bp': 0.9894540029829183, 'sys_len': 76495, 'ref_len': 77306}

... Torchmetrics SacreBLEU:
... score: 33.1378173828125
... preds_len: 76434.0
... target_len: 77247.0

... Torchmetrics SacreBLEU new version:
... score: 33.1753044128418
... preds_len: 76495.0
... target_len: 77306.0

while the BLEU score is still different, the discrepancy between the original version (used through HuggingFace) and the new torchmetrics implementation is smaller. The computed lengths for both predictions and references are the same now.

codecov · 2023-05-13T09:02:02Z

Codecov Report

Merging #1778 (50d97ba) into master (17c0e9f) will decrease coverage by 42%.
The diff coverage is 100%.

Additional details and impacted files

@@           Coverage Diff            @@
##           master   #1778     +/-   ##
========================================
- Coverage      87%     46%    -42%     
========================================
  Files         253     253             
  Lines       14164   14164             
========================================
- Hits        12387    6458   -5929     
- Misses       1777    7706   +5929

Fixed _tokenizer_13a

50d97ba

The behaviour is the same as the original implementation now

RistoAle97 requested review from SkafteNicki, Borda and justusschock as code owners May 12, 2023 16:17

Borda approved these changes May 12, 2023

View reviewed changes

stancld approved these changes May 13, 2023

View reviewed changes

stancld enabled auto-merge (squash) May 13, 2023 08:42

mergify bot added the ready label May 13, 2023

stancld merged commit 2d35650 into Lightning-AI:master May 13, 2023

RistoAle97 deleted the sacrebleu_fix branch September 4, 2023 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small fix for SacreBLEUScore and the mteval-v13a tokenizer #1778

Small fix for SacreBLEUScore and the mteval-v13a tokenizer #1778

RistoAle97 commented May 12, 2023 •

edited by github-actions bot

Loading

RistoAle97 commented May 12, 2023

codecov bot commented May 13, 2023 •

edited

Loading

Small fix for SacreBLEUScore and the mteval-v13a tokenizer #1778

Small fix for SacreBLEUScore and the mteval-v13a tokenizer #1778

Conversation

RistoAle97 commented May 12, 2023 • edited by github-actions bot Loading

What does this PR do?

Did you have fun?

RistoAle97 commented May 12, 2023

codecov bot commented May 13, 2023 • edited Loading

Codecov Report

RistoAle97 commented May 12, 2023 •

edited by github-actions bot

Loading

codecov bot commented May 13, 2023 •

edited

Loading