Add Pre trained Models

General Domain

Description

The general models are trained on the OntoNotes 5.0. The followings show the distribution of each genre in our training data.

Broadcasting conversations: 10,826 sentences, 171,120 tokens.
Broadcasting news: 10,349 sentences, 206,057 tokens.
News magazines: 6,672 sentences, 163,627 tokens.
Newswires: 34,492 sentences, 876,399 tokens.
Religious texts: 21,419 sentences, 296,437 tokens.
Telephone conversations: 8,969 sentences, 85,463 tokens.
Web-texts: 12,452 sentences, 284,975 tokens.

Without maven

Download the following model files.
Dictionary.
Part-of-speech tagging.
Dependency parsing.
Semantic role labeling.

Add all jar files to your classpath.

export LIB=SOME_PATH/clearnlp

export MODEL_DICT=$LIB/clearnlp-dictionary-1.0.jar
export MODEL_POS=$LIB/clearnlp-general-en-pos-1.1.jar
export MODEL_DEP=$LIB/clearnlp-general-en-dep-1.2.jar
export MODEL_SRL=$LIB/clearnlp-general-en-srl-1.1.jar
export MODEL_LIB=$MODEL_DICT:$MODEL_POS:$MODEL_DEP:$MODEL_SRL

export CLASSPATH=$CLEARNLP_LIB:$MODEL_LIB:.

With maven

Add the following dependencies to your pom.xml.

<dependency>
  <groupId>com.clearnlp</groupId>
  <artifactId>clearnlp-dictionary</artifactId>
  <version>1.0</version>
</dependency>
<dependency>
  <groupId>com.clearnlp</groupId>
  <artifactId>clearnlp-general-en-pos</artifactId>
  <version>1.1</version>
</dependency>
<dependency>
  <groupId>com.clearnlp</groupId>
  <artifactId>clearnlp-general-en-dep</artifactId>
  <version>1.2</version>
</dependency>
<dependency>
  <groupId>com.clearnlp</groupId>
  <artifactId>clearnlp-general-en-srl</artifactId>
  <version>1.1</version>
</dependency>

Medical Domain

Description

The medical models are trained on corpora collected by the MiPACQ, SHARP, and THYME projects. The followings show the distribution of each genre in our training data.

MiPACQ: Clinical questions: 1,600 sentences, 30,138 tokens.
MiPACQ: Medpedia articles: 2,796 sentences, 49,922 tokens.
MiPACQ: clinical notes: 8,001 sentences, 107,191 tokens.
MiPACQ: pathological notes: 1,225 sentences, 21,581 tokens.
SHARP: Seattle group health clinical notes: 5,020 sentences, 61,124 tokens.
SHARP: Seattle group health pathological notes: 2,294 sentences, 34,384 tokens.
SHARP clinical notes: 6,787 sentences, 94,205 tokens.
SHARP stratified: 4,312 sentences, 43,023 tokens.
SHARP stratified SGH: 13,432 sentences, 139,266 tokens.
TEMPREL clinical notes: 18,927 sentences, 255,604 tokens.
TEMPREL pathological notes: 4,400 sentences, 80,064 tokens.

Without maven

Download the following model files.
Dictionary.
Part-of-speech tagging.
Dependency parsing.
Semantic role labeling.

Add all jar files to your classpath.

export LIB=SOME_PATH/clearnlp

export MODEL_DICT=$LIB/clearnlp-dictionary-1.0.jar
export MODEL_POS=$LIB/clearnlp-medical-en-pos-1.0.jar
export MODEL_DEP=$LIB/clearnlp-medical-en-dep-1.0.jar
export MODEL_SRL=$LIB/clearnlp-medical-en-srl-1.0.jar
export MODEL_LIB=$MODEL_DICT:$MODEL_POS:$MODEL_DEP:$MODEL_SRL

export CLASSPATH=$CLEARNLP_LIB:$MODEL_LIB:.

With maven

Add the following dependencies to your pom.xml.

<dependency>
  <groupId>com.clearnlp</groupId>
  <artifactId>clearnlp-dictionary</artifactId>
  <version>1.0</version>
</dependency>
<dependency>
  <groupId>com.clearnlp</groupId>
  <artifactId>clearnlp-medical-en-pos</artifactId>
  <version>1.0</version>
</dependency>
<dependency>
  <groupId>com.clearnlp</groupId>
  <artifactId>clearnlp-medical-en-dep</artifactId>
  <version>1.0</version>
</dependency>
<dependency>
  <groupId>com.clearnlp</groupId>
  <artifactId>clearnlp-medical-en-srl</artifactId>
  <version>1.0</version>
</dependency>

ClearNLP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Pre trained Models

Contents

General Domain

Description

Without maven

With maven

Medical Domain

Description

Without maven

With maven

Clone this wiki locally