Skip to content

NLPA

Latest
Compare
Choose a tag to compare
@moncho-mendez moncho-mendez released this 31 Jul 10:04
· 85 commits to master since this release

NLPA is a framework designed to operate in conjuction with BDP4J (https://github.com/sing-group/bdp4j) and able to extract texts from Twitter, Youtube Comments, text files, raw email files (.eml) or WARC (Web Archive) files. The extracted text can be preprocessed into a Dataset using task (org.bdp4j.pipe.Pipe) definitions. This framework incorporates more than 30 preprocessing tasks to transform the text.