Skip to content

Nexdata-AI/100000-Groups-Chinese-Uighur-Parallel-Corpus-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

100000-Groups-Chinese-Uighur-Parallel-Corpus-Data

Description

100,000 sets of Chinese and Uighur language parallel translation corpus, data storage format is txt document, data fluency and loyalty is above 80%. Data cleaning, desensitization and quality inspection have been carried out, which can be used as a basic corpus for text data analysis and in fields such as machine translation. For more details, please refer to the link: https://www.nexdata.ai/datasets/nlu/149?source=Github

Specifications

Storage format

TXT

Data content

Chinese-Uighur Parallel Corpus Data

Data size

0.1 million pairs of Chinese-Uighur Parallel Corpus Data

Language

Chinese, Uighur

Application scenario

machine translation

Licensing Information

Commercial License