Skip to content

sfb833-a3/cow2conllx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

cow-extract

This small utility extracts tokens from the (ill-formed) COW XML format and stores it as CoNLL-X.

Currently it extracts:

  • Tokens.
  • Quality estimations (bdc/bpc) as features.

Usage

cow-extract can read/write from stdin/stdout or files. E.g. from files:

cow-extract somecorpus.xml somecorpus.conll

Using stdin/stdout is especially useful in combination with gzip:

zcat somecorpus.xml.gz | cow-extract | gzip - > somecorpus.conll.gz

About

Small COW to CoNLL-X conversion utility

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages