Skip to content
This repository has been archived by the owner on May 4, 2021. It is now read-only.

Initial public release

Compare
Choose a tag to compare
@achimr achimr released this 30 Oct 17:09
· 50 commits to master since this release

Initial public release of baseline parallel data collection pipeline.

The pipeline is documented in the readme and documents linked from there.

Phase 1 of the pipeline is an alpha release, Phase 2 is in beta.

Index files for the 2015_32 CommonCrawl for the language pairs en↔it, en↔fr, en↔de, en↔es, en↔pt, en↔nl and en↔ru are included as attached, compressed files. These index files are licensed under a Creative Commons Attribution 4.0 International License.