Skip to content

Latest commit

 

History

History
80 lines (60 loc) · 3.45 KB

README.md

File metadata and controls

80 lines (60 loc) · 3.45 KB

Doc-Prism (mBART-50)

This README describes how to use Doc-Prism an extension of the original Prism metric that can be used for document-level evaluation.

Contrary to the original implementation that used a multilingual MT model, we use mBART-50, a multilingual language model that is pre-trained at the document level, to score the MT outputs.

Installation

This codebase is an implementation of the Prism metric using the Hugging Face Transformers library. For a detailed presnetation of the BERTScore metric, including usage examples and instructions see the original documentation.

Get some files to score

sacrebleu -t wmt21 -l en-de --echo src | head -n 20 > src.en
sacrebleu -t wmt21 -l en-de --echo ref | head -n 20 > ref.de
sacrebleu -t wmt21 -l en-de --echo ref | head -n 20 > hyp.de  # put your system output here

To evaluate at the document level we need to know where the document boundaries are in the test set, so that we only use valid context. This is passed in as a file where each line contains a document ID.

For WMT test sets this can be obtained via sacreBLEU:

sacrebleu -t wmt21 -l en-de --echo docid | head -n 20 > docids.ende

Python usage:

In order to use Doc-Prism with python simply add doc=True when calling the score function.

from prism import MBARTPrism
from add_context import add_context

# load data files
doc_ids = [x.strip() for x in open('docids.ende', 'rt').readlines()]
hyp = [x.strip() for x in open('hyp.de', 'rt').readlines()]
ref = [x.strip() for x in open('ref.de', 'rt').readlines()]

# load prism model
model_path = "facebook/mbart-large-50"
prism = MBARTPrism(checkpoint=model_path, src_lang="en", tgt_lang="de")

# add contexts to reference and hypothesis texts
hyp = add_context(orig_txt=hyp, context=ref, doc_ids=doc_ids, sep_token=prism.encoder.tokenizer.sep_token)
ref = add_context(orig_txt=ref, context=ref, doc_ids=doc_ids, sep_token=prism.encoder.tokenizer.sep_token)

seg_score = prism.score(cand=hyp, ref=ref, doc=True)

Reproduce

To reproduce the Doc-Prism results from the paper run the score_doc-metrics.py script with the flags --model prism and --doc.

git clone https://github.com/google-research/mt-metrics-eval.git
cd mt-metrics-eval
pip install .
alias mtme='python3 -m mt_metrics_eval.mtme'
mtme --download  # Puts ~1G of data into $HOME/.mt-metrics-eval.

To obtain system-level scores of Doc-Prism (mBART-50) for the WMT21 testet run:

python score_doc-prism.py --campaign wmt21.news --lp en-de --doc --level sys

Paper

If you use the code in your work, please cite Embarrassingly Easy Document-Level MT Metrics: How to Convert Any Pretrained Metric Into a Document-Level Metric:

@inproceedings{easy_doc_mt
    title = {Embarrassingly Easy Document-Level MT Metrics: How to Convert Any Pretrained Metric Into a Document-Level Metric},
    author = {Vernikos, Giorgos and Thompson, Brian and Mathur, Prashant and Federico, Marcello},
    booktitle = "Proceedings of the Seventh Conference on Machine Translation",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://statmt.org/wmt22/pdf/2022.wmt-1.6.pdf",
}