Korean NER with Pytorch

Korean NER Task with CharCNN + BiLSTM + CRF (with Naver NLP Challenge dataset), implemented with Pytorch

Model

Character Embedding with CNN
Concatenate word embedding with character represention
Put the feature above to BiLSTM + CRF

Dependencies

python>=3.5
torch==1.4.0
seqeval==0.0.12
pytorch-crf==0.7.2
gdown==3.10.1

Data

	Train	Test
# of Data	81,000	9,000

Naver NLP Challenge 2018 NER Dataset (Github link)
Original github only has train dataset, so test dataset is created by splitting the train dataset. (Data link)

Pretrained Word Vectors

Use Korean fastText vectors with 300 dimension
It takes quiet long time to load from original vector, so I take out the word vectors that are only in word vocab.
It will be downloaded automatically when you run main.py.

Usage

$ python3 main.py --do_train --do_eval

Evaluation prediction result will be saved in preds dir when you give --write_pred option.

Results

	Slot F1 (%)
CNN+BiLSTM+CRF	73.65
CNN+BiLSTM+CRF (+fastText)	74.57

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
wordvec		wordvec
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Korean NER with Pytorch

Model

Dependencies

Data

Pretrained Word Vectors

Usage

Results

Reference

About

Releases

Packages

Languages

License

monologg/korean-ner-pytorch

Folders and files

Latest commit

History

Repository files navigation

Korean NER with Pytorch

Model

Dependencies

Data

Pretrained Word Vectors

Usage

Results

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages