Skip to content

NER Task with CNN + BiLSTM + CRF (with Naver NLP Challenge dataset) with Pytorch

License

Notifications You must be signed in to change notification settings

monologg/korean-ner-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Korean NER with Pytorch

Korean NER Task with CharCNN + BiLSTM + CRF (with Naver NLP Challenge dataset), implemented with Pytorch

Model

  • Character Embedding with CNN
  • Concatenate word embedding with character represention
  • Put the feature above to BiLSTM + CRF

Dependencies

  • python>=3.5
  • torch==1.4.0
  • seqeval==0.0.12
  • pytorch-crf==0.7.2
  • gdown==3.10.1

Data

Train Test
# of Data 81,000 9,000
  • Naver NLP Challenge 2018 NER Dataset (Github link)
  • Original github only has train dataset, so test dataset is created by splitting the train dataset. (Data link)

Pretrained Word Vectors

  • Use Korean fastText vectors with 300 dimension
  • It takes quiet long time to load from original vector, so I take out the word vectors that are only in word vocab.
  • It will be downloaded automatically when you run main.py.

Usage

$ python3 main.py --do_train --do_eval
  • Evaluation prediction result will be saved in preds dir when you give --write_pred option.

Results

Slot F1 (%)
CNN+BiLSTM+CRF 73.65
CNN+BiLSTM+CRF (+fastText) 74.57

Reference

About

NER Task with CNN + BiLSTM + CRF (with Naver NLP Challenge dataset) with Pytorch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages