-
Notifications
You must be signed in to change notification settings - Fork 31
Add Pre trained Models
jdchoi77 edited this page Nov 7, 2014
·
3 revisions
The general models are trained on the OntoNotes 5.0. The followings show the distribution of each genre in our training data.
- Broadcasting conversations: 10,826 sentences, 171,120 tokens.
- Broadcasting news: 10,349 sentences, 206,057 tokens.
- News magazines: 6,672 sentences, 163,627 tokens.
- Newswires: 34,492 sentences, 876,399 tokens.
- Religious texts: 21,419 sentences, 296,437 tokens.
- Telephone conversations: 8,969 sentences, 85,463 tokens.
- Web-texts: 12,452 sentences, 284,975 tokens.
- Download the following model files.
- Dictionary.
- Part-of-speech tagging.
- Dependency parsing.
- Semantic role labeling.
- Add all jar files to your classpath.
export LIB=SOME_PATH/clearnlp export MODEL_DICT=$LIB/clearnlp-dictionary-1.0.jar export MODEL_POS=$LIB/clearnlp-general-en-pos-1.1.jar export MODEL_DEP=$LIB/clearnlp-general-en-dep-1.2.jar export MODEL_SRL=$LIB/clearnlp-general-en-srl-1.1.jar export MODEL_LIB=$MODEL_DICT:$MODEL_POS:$MODEL_DEP:$MODEL_SRL export CLASSPATH=$CLEARNLP_LIB:$MODEL_LIB:.
-
Add the following dependencies to your
pom.xml
.<dependency> <groupId>com.clearnlp</groupId> <artifactId>clearnlp-dictionary</artifactId> <version>1.0</version> </dependency> <dependency> <groupId>com.clearnlp</groupId> <artifactId>clearnlp-general-en-pos</artifactId> <version>1.1</version> </dependency> <dependency> <groupId>com.clearnlp</groupId> <artifactId>clearnlp-general-en-dep</artifactId> <version>1.2</version> </dependency> <dependency> <groupId>com.clearnlp</groupId> <artifactId>clearnlp-general-en-srl</artifactId> <version>1.1</version> </dependency>
The medical models are trained on corpora collected by the MiPACQ, SHARP, and THYME projects. The followings show the distribution of each genre in our training data.
- MiPACQ: Clinical questions: 1,600 sentences, 30,138 tokens.
- MiPACQ: Medpedia articles: 2,796 sentences, 49,922 tokens.
- MiPACQ: clinical notes: 8,001 sentences, 107,191 tokens.
- MiPACQ: pathological notes: 1,225 sentences, 21,581 tokens.
- SHARP: Seattle group health clinical notes: 5,020 sentences, 61,124 tokens.
- SHARP: Seattle group health pathological notes: 2,294 sentences, 34,384 tokens.
- SHARP clinical notes: 6,787 sentences, 94,205 tokens.
- SHARP stratified: 4,312 sentences, 43,023 tokens.
- SHARP stratified SGH: 13,432 sentences, 139,266 tokens.
- TEMPREL clinical notes: 18,927 sentences, 255,604 tokens.
- TEMPREL pathological notes: 4,400 sentences, 80,064 tokens.
- Download the following model files.
- Dictionary.
- Part-of-speech tagging.
- Dependency parsing.
- Semantic role labeling.
- Add all jar files to your classpath.
export LIB=SOME_PATH/clearnlp export MODEL_DICT=$LIB/clearnlp-dictionary-1.0.jar export MODEL_POS=$LIB/clearnlp-medical-en-pos-1.0.jar export MODEL_DEP=$LIB/clearnlp-medical-en-dep-1.0.jar export MODEL_SRL=$LIB/clearnlp-medical-en-srl-1.0.jar export MODEL_LIB=$MODEL_DICT:$MODEL_POS:$MODEL_DEP:$MODEL_SRL export CLASSPATH=$CLEARNLP_LIB:$MODEL_LIB:.
-
Add the following dependencies to your
pom.xml
.<dependency> <groupId>com.clearnlp</groupId> <artifactId>clearnlp-dictionary</artifactId> <version>1.0</version> </dependency> <dependency> <groupId>com.clearnlp</groupId> <artifactId>clearnlp-medical-en-pos</artifactId> <version>1.0</version> </dependency> <dependency> <groupId>com.clearnlp</groupId> <artifactId>clearnlp-medical-en-dep</artifactId> <version>1.0</version> </dependency> <dependency> <groupId>com.clearnlp</groupId> <artifactId>clearnlp-medical-en-srl</artifactId> <version>1.0</version> </dependency>