FLORES-200 and NLLB Professionally Translated Datasets: NLLB-Seed, NLLB-MD, and Toxicity-200

⚠️ This repository is no longer being updated ⚠️

Newer versions of the FLORES and NLLB-Seed datasets managed by the Open Language Data Initiative are available here:

Quick-access to the original READMEs:

Citation

If you use any of this data in your work, please cite:

@article{nllb2022,
  author    = {NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi,  Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Jeff Wang},
  title     = {No Language Left Behind: Scaling Human-Centered Machine Translation},
  year      = {2022}
}

Changelog

2022-06-30: Released FLORES-200, NLLB-Seed, NLLB-MD, and Toxicity-200
2021-06-04: Released FLORES-101

Licenses

FLORES-200: CC-BY-SA 4.0
NLLB-SEED: CC-BY-SA 4.0
NLLB-MD: CC-BY-NC 4.0
Toxicity-200: CC-BY-SA 4.0

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
flores200		flores200
nllb_md		nllb_md
nllb_seed		nllb_seed
ocr		ocr
previous_releases		previous_releases
shared_tasks		shared_tasks
toxicity		toxicity
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE_CC-BY-NC4.0		LICENSE_CC-BY-NC4.0
LICENSE_CC-BY-SA		LICENSE_CC-BY-SA
README.md		README.md
flores_logo.png		flores_logo.png
flores_move.py		flores_move.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FLORES-200 and NLLB Professionally Translated Datasets: NLLB-Seed, NLLB-MD, and Toxicity-200

Citation

Changelog

Licenses

About

Releases

Packages

Contributors 15

Languages

facebookresearch/flores

Folders and files

Latest commit

History

Repository files navigation

FLORES-200 and NLLB Professionally Translated Datasets: NLLB-Seed, NLLB-MD, and Toxicity-200

Citation

Changelog

Licenses

About

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 15

Languages

Packages