Skip to content

Add Pre trained Models

jdchoi77 edited this page Nov 7, 2014 · 3 revisions

Contents

General Domain

Description

The general models are trained on the OntoNotes 5.0. The followings show the distribution of each genre in our training data.

  • Broadcasting conversations: 10,826 sentences, 171,120 tokens.
  • Broadcasting news: 10,349 sentences, 206,057 tokens.
  • News magazines: 6,672 sentences, 163,627 tokens.
  • Newswires: 34,492 sentences, 876,399 tokens.
  • Religious texts: 21,419 sentences, 296,437 tokens.
  • Telephone conversations: 8,969 sentences, 85,463 tokens.
  • Web-texts: 12,452 sentences, 284,975 tokens.

Without maven

  • Download the following model files.
  • Dictionary.
  • Part-of-speech tagging.
  • Dependency parsing.
  • Semantic role labeling.
  • Add all jar files to your classpath.
    export LIB=SOME_PATH/clearnlp
    
    export MODEL_DICT=$LIB/clearnlp-dictionary-1.0.jar
    export MODEL_POS=$LIB/clearnlp-general-en-pos-1.1.jar
    export MODEL_DEP=$LIB/clearnlp-general-en-dep-1.2.jar
    export MODEL_SRL=$LIB/clearnlp-general-en-srl-1.1.jar
    export MODEL_LIB=$MODEL_DICT:$MODEL_POS:$MODEL_DEP:$MODEL_SRL
    
    export CLASSPATH=$CLEARNLP_LIB:$MODEL_LIB:.

With maven

  • Add the following dependencies to your pom.xml.

    <dependency>
      <groupId>com.clearnlp</groupId>
      <artifactId>clearnlp-dictionary</artifactId>
      <version>1.0</version>
    </dependency>
    <dependency>
      <groupId>com.clearnlp</groupId>
      <artifactId>clearnlp-general-en-pos</artifactId>
      <version>1.1</version>
    </dependency>
    <dependency>
      <groupId>com.clearnlp</groupId>
      <artifactId>clearnlp-general-en-dep</artifactId>
      <version>1.2</version>
    </dependency>
    <dependency>
      <groupId>com.clearnlp</groupId>
      <artifactId>clearnlp-general-en-srl</artifactId>
      <version>1.1</version>
    </dependency>
    

Medical Domain

Description

The medical models are trained on corpora collected by the MiPACQ, SHARP, and THYME projects. The followings show the distribution of each genre in our training data.

  • MiPACQ: Clinical questions: 1,600 sentences, 30,138 tokens.
  • MiPACQ: Medpedia articles: 2,796 sentences, 49,922 tokens.
  • MiPACQ: clinical notes: 8,001 sentences, 107,191 tokens.
  • MiPACQ: pathological notes: 1,225 sentences, 21,581 tokens.
  • SHARP: Seattle group health clinical notes: 5,020 sentences, 61,124 tokens.
  • SHARP: Seattle group health pathological notes: 2,294 sentences, 34,384 tokens.
  • SHARP clinical notes: 6,787 sentences, 94,205 tokens.
  • SHARP stratified: 4,312 sentences, 43,023 tokens.
  • SHARP stratified SGH: 13,432 sentences, 139,266 tokens.
  • TEMPREL clinical notes: 18,927 sentences, 255,604 tokens.
  • TEMPREL pathological notes: 4,400 sentences, 80,064 tokens.

Without maven

  • Download the following model files.
  • Dictionary.
  • Part-of-speech tagging.
  • Dependency parsing.
  • Semantic role labeling.
  • Add all jar files to your classpath.
    export LIB=SOME_PATH/clearnlp
    
    export MODEL_DICT=$LIB/clearnlp-dictionary-1.0.jar
    export MODEL_POS=$LIB/clearnlp-medical-en-pos-1.0.jar
    export MODEL_DEP=$LIB/clearnlp-medical-en-dep-1.0.jar
    export MODEL_SRL=$LIB/clearnlp-medical-en-srl-1.0.jar
    export MODEL_LIB=$MODEL_DICT:$MODEL_POS:$MODEL_DEP:$MODEL_SRL
    
    export CLASSPATH=$CLEARNLP_LIB:$MODEL_LIB:.

With maven

  • Add the following dependencies to your pom.xml.

    <dependency>
      <groupId>com.clearnlp</groupId>
      <artifactId>clearnlp-dictionary</artifactId>
      <version>1.0</version>
    </dependency>
    <dependency>
      <groupId>com.clearnlp</groupId>
      <artifactId>clearnlp-medical-en-pos</artifactId>
      <version>1.0</version>
    </dependency>
    <dependency>
      <groupId>com.clearnlp</groupId>
      <artifactId>clearnlp-medical-en-dep</artifactId>
      <version>1.0</version>
    </dependency>
    <dependency>
      <groupId>com.clearnlp</groupId>
      <artifactId>clearnlp-medical-en-srl</artifactId>
      <version>1.0</version>
    </dependency>