Skip to content
/ parity Public

Experimental data, as described and analysed in our EMNLP 2018 paper.

Notifications You must be signed in to change notification settings

laeubli/parity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Has Neural Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

This repository contains the data we collected to assess the impact of document-level context on human perception of machine translation quality.

We briefly outline the contents of each file below. Please see our paper for more detailed information:

@inproceedings{laeubli2018parity,
  author = "L{\"a}ubli, Samuel and Sennrich, Rico and Volk, Martin",
  title = "Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation",
  booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
  year = "2018",
  address = "Brussels, Belgium",
  publisher = "Association for Computational Linguistics",
  url = "https://arxiv.org/abs/1808.07048"
}

participants.csv

Meta information on all participants – professional translators recruited from ProZ. They produced the ratings available in ratings.csv.

documents.csv

55 full articles randomly sampled from the WMT 2017 English–Chinese test set, only considering the 123 native Chinese articles. We only used Chinese sources from WMT; human (Reference-HT; the human column) and machine translations (Combo-6; the mt column) were obtained from data released by Microsoft.

Documents E-1 to E-55 and I-1 to I-55 are the same articles with a different random subset (5 articles) converted to control items (spam).

sentences.csv

2 x 120 sentences randomly sampled from the WMT 2017 English–Chinese test set, only considering the 123 native Chinese articles. We only used Chinese sources from WMT; human (Reference-HT; the human column) and machine translations (Combo-6; the mt column) were obtained from data released by Microsoft.

Sentences U-1 to U-120 overlap with the full documents in documents.csv. 16 random sentences were converted to control items (spam) in each set.

ratings.csv

Ratings produced by participants (see participants.csv). In our paper, we excluded ratings for sentences U-1 to U-120 from analysis because of overlap with full documents (see above).

ratings.with-spam.csv

The same as ratings.csv, including control items (spam).

About

Experimental data, as described and analysed in our EMNLP 2018 paper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published