034_ERNIE: Enhanced Language Representation with Informative Entities

submitted at ACL2019

code: link in paper

Task

既存のLMでは、固有Entityに対するrepresentation は全てUNK扱いとなる。KBの構造情報やdescriptionもフルに活かした、Entity Representationを獲得する場合、以下の二点がchallengeとなる。
- (1) Structured Knowledge Encoding
  - regarding to the given text, how to effectively extract and encode its related informative facts in KGs for language representation models is an important problem;
- (2) Heterogeneous Information Fusion :
  - how to design a special pre-training objective to fuse the lexical, syntactic, and knowledge information is another challenge.

For (1),
- textに対してNER
- KG 内に、検出したNEに相当するEntityがあればそれと結びつける。
- tripletのようなfact set ではなく、KG内のgraph structure をKG内entity embeddingとともに出力し、ERNIEモデルのinputとする。(要精読)
For (2),
- Masked Language model 及び Next sentence prediction を最適化タスクとして解く。(BERT と同じ)
- proposed: input text内のentity をランダムに隠し、そのentity をKG内から当てるタスクを解く。(要精読)

written at 2019-05-29