This wrapper returns the untokenized sentence splitting result from CoreNLP toolkit.
Please download CoreNLP and unzip everything to stanford-corenlp-4.2.0
folder.
If you want to work in Arabic, please download the Arabic package and put it into stanford-corenlp-4.2.0
folder.
The latest version for CoreNLP package and Arabic package can be found from their offical website.
python sentence_splitter_wrapper_for_CoreNLP_En.py
python sentence_splitter_wrapper_for_CoreNLP_Ar.py
Each file contains an example sentence. The code will print out the splitting results.
This sentence splitter has gone through a few changes.
- Danqi Chen wrote the original python wrapper for the tokenization function in CoreNLP.
- Chao Jiang modified the code to make the sentence splitter produce split setnences with untokenized text.
- Wuwei Lan modified the code to make it works with the Arabic language.
This material is based in part on research sponsored by IARPA via the BETTER program (2019-19051600004).