Skip to content

Latest commit

 

History

History
27 lines (18 loc) · 1.53 KB

README.md

File metadata and controls

27 lines (18 loc) · 1.53 KB

A sentence splitter wrapper for CoreNLP

About

This wrapper returns the untokenized sentence splitting result from CoreNLP toolkit.

Before starting

Please download CoreNLP and unzip everything to stanford-corenlp-4.2.0 folder.

If you want to work in Arabic, please download the Arabic package and put it into stanford-corenlp-4.2.0 folder.

The latest version for CoreNLP package and Arabic package can be found from their offical website.

Usage

python sentence_splitter_wrapper_for_CoreNLP_En.py

python sentence_splitter_wrapper_for_CoreNLP_Ar.py

Each file contains an example sentence. The code will print out the splitting results.

Update hisotry

This sentence splitter has gone through a few changes.

  • Danqi Chen wrote the original python wrapper for the tokenization function in CoreNLP.
  • Chao Jiang modified the code to make the sentence splitter produce split setnences with untokenized text.
  • Wuwei Lan modified the code to make it works with the Arabic language.

Acknowledgment

This material is based in part on research sponsored by IARPA via the BETTER program (2019-19051600004).