Skip to content

This is a sentence splitter for English and Arabic based on Stanford CoreNLP package.

Notifications You must be signed in to change notification settings

chaojiang06/CoreNLP_sentence_splitter

Repository files navigation

A sentence splitter wrapper for CoreNLP

About

This wrapper returns the untokenized sentence splitting result from CoreNLP toolkit.

Before starting

Please download CoreNLP and unzip everything to stanford-corenlp-4.2.0 folder.

If you want to work in Arabic, please download the Arabic package and put it into stanford-corenlp-4.2.0 folder.

The latest version for CoreNLP package and Arabic package can be found from their offical website.

Usage

python sentence_splitter_wrapper_for_CoreNLP_En.py

python sentence_splitter_wrapper_for_CoreNLP_Ar.py

Each file contains an example sentence. The code will print out the splitting results.

Update hisotry

This sentence splitter has gone through a few changes.

  • Danqi Chen wrote the original python wrapper for the tokenization function in CoreNLP.
  • Chao Jiang modified the code to make the sentence splitter produce split setnences with untokenized text.
  • Wuwei Lan modified the code to make it works with the Arabic language.

Acknowledgment

This material is based in part on research sponsored by IARPA via the BETTER program (2019-19051600004).

About

This is a sentence splitter for English and Arabic based on Stanford CoreNLP package.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages