English|简体中文
Remind: This repo has been refactored, for paper re-production or backward compatibility; plase checkout to repro branch
ERNIE 2.0 is a continual pre-training framework for language understanding in which pre-training tasks can be incrementally built and learned through multi-task learning. ERNIE 2.0 builds a strong basic for nearly every NLP tasks: Text Classification, Ranking, NER, machine reading comprehension, text genration and so on.
-
May.20.2021:
ERNIE-Doc
,ERNIE-Gram
,ERNIE-ViL
models are avaliable now!ERNIE-UNIMO
will be released soon.
-
Dec.29.2020:
- Pretrain and finetune ERNIE with PaddlePaddle v2.0.
- New AMP(auto mixed precision) feature for every demo in this repo.
- Introducing
Gradient accumulation
, runERNIE-large
with only 8G memory.
-
Sept.24.2020:
- We have announced the
ERNIE-ViL
!- A knowledge-enhanced joint representations for vision-language tasks.
- Constructing three Scene Graph Prediction tasks utilizing structured knowledge.
- The state-of-the-art performance on 5 downstream tasks, 1st place on VCR leaderboad.
- A knowledge-enhanced joint representations for vision-language tasks.
- We have announced the
-
May.20.2020:
-
Try ERNIE in "
dygraph
", with:- Eager execution with
paddle.fluid.dygraph
. - Distributed training.
- Easy deployment.
- Learn NLP in Aistudio tutorials.
- Backward compatibility for old-styled checkpoint
- Eager execution with
-
ERNIE-GEN
is avaliable now! (link here)- the state-of-the-art pre-trained model for generation tasks, accepted by
IJCAI-2020
.- A novel span-by-span generation pre-training task.
- An infilling generation echanism and a noise-aware generation method.
- Implemented by a carefully designed Multi-Flow Attention architecture.
- You are able to
download
all models includingbase/large/large-430G
.
- the state-of-the-art pre-trained model for generation tasks, accepted by
-
-
Apr.30.2020: Release ERNIESage, a novel Graph Neural Network Model using ERNIE as its aggregator. It is implemented through PGL
-
Mar.27.2020: Champion on 5 SemEval2020 sub tasks
-
Dec.26.2019: 1st place on GLUE leaderboard
-
Nov.6.2019: Introducing ERNIE-tiny
-
Jul.7.2019: Introducing ERNIE2.0
-
Mar.16.2019: Introducing ERNIE1.0
import numpy as np
import paddle as P
from ernie.tokenizing_ernie import ErnieTokenizer
from ernie.modeling_ernie import ErnieModel
model = ErnieModel.from_pretrained('ernie-1.0') # Try to get pretrained model from server, make sure you have network connection
model.eval()
tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
ids, _ = tokenizer.encode('hello world')
ids = P.to_tensor(np.expand_dims(ids, 0)) # insert extra `batch` dimension
pooled, encoded = model(ids) # eager execution
print(pooled.numpy()) # convert results to numpy
Don't have GPU? try ERNIE in AIStudio! (please choose the latest version and apply for a GPU environment)
- ERNIE for beginners
- Sementic analysis
- Cloze test
- Knowledge distillation
- Ask ERNIE
- Loading old-styled checkpoint
This repo requires PaddlePaddle 1.7.0+, please see here for installaton instruction.
pip install paddle-ernie
or
git clone https://github.com/PaddlePaddle/ERNIE.git --depth 1
cd ERNIE
pip install -r requirements.txt
pip install -e .
Model | Description | abbreviation |
---|---|---|
ERNIE 1.0 Base for Chinese | L12H768A12 | ernie-1.0 |
ERNIE Tiny | L3H1024A16 | ernie-tiny |
ERNIE 2.0 Base for English | L12H768A12 | ernie-2.0-en |
ERNIE 2.0 Large for English | L24H1024A16 | ernie-2.0-large-en |
ERNIE Gen base for English | L12H768A12 | ernie-gen-base-en |
ERNIE Gen Large for English | L24H1024A16 | ernie-gen-large-en |
ERNIE Gen Large 430G for English | Layer:24, Hidden:1024, Heads:16 + 430G pretrain corpus | ernie-gen-large-430g-en |
ERNIE Doc Base for Chinese | L12H768A12 | ernie-doc-base-zh |
ERNIE Doc Base for English | L12H768A12 | ernie-doc-base-en |
ERNIE Doc Large for English | L24H1024A16 | ernie-doc-large-zh |
ERNIE Gram Base for Chinese | L12H768A12 | ernie-gram-zh |
ERNIE Gram Base for English | L12H768A12 | ernie-gram-en |
English Datasets
Download the GLUE datasets by running this script
the --data_dir
option in the following section assumes a directory tree like this:
data/xnli
├── dev
│ └── 1
├── test
│ └── 1
└── train
└── 1
see demo data for MNLI task.
Chinese Datasets
Datasets | Description |
---|---|
XNLI | XNLI is a natural language inference dataset in 15 languages. It was jointly built by Facebook and New York University. We use Chinese data of XNLI to evaluate language understanding ability of our model. url |
ChnSentiCorp | ChnSentiCorp is a sentiment analysis dataset consisting of reviews on online shopping of hotels, notebooks and books. |
MSRA-NER | MSRA-NER (SIGHAN2006) dataset is released by MSRA for recognizing the names of people, locations and organizations in text. |
NLPCC2016-DBQA | NLPCC2016-DBQA is a sub-task of NLPCC-ICCPOL 2016 Shared Task which is hosted by NLPCC(Natural Language Processing and Chinese Computing), this task targets on selecting documents from the candidates to answer the questions. [url: http://tcci.ccf.org.cn/conference/2016/dldoc/evagline2.pdf] |
CMRC2018 | CMRC2018 is a evaluation of Chinese extractive reading comprehension hosted by Chinese Information Processing Society of China (CIPS-CL). url |
- try eager execution with
dygraph model
:
python3 ./demo/finetune_classifier.py \
--from_pretrained ernie-1.0 \
--data_dir ./data/xnli
-
specify
--use_amp
to activate AMP training. -
--bsz
denotes global batch size for one optimization step,--micro_bsz
denotes maximum batch size for each GPU device. if--micro_bsz < --bsz
, gradient accumulation will be actiavted. -
Distributed finetune
paddle.distributed.launch
is a process manager, we use it to launch python processes on each avalible GPU devices:
When in distributed training, max_steps
is used as stopping criteria rather than epoch
to prevent dead block.
You could calculate max_steps
with EPOCH * NUM_TRAIN_EXAMPLES / TOTAL_BATCH
.
Also notice than we shard the train data according to device id to prevent over fitting.
demo:
(make sure you have more than 2 GPUs,
online model download can not work in paddle.distributed.launch
,
you need to run single card finetuning first to get pretrained model, or donwload and extract one manualy from here):
python3 -m paddle.distributed.launch \
./demo/finetune_classifier_distributed.py \
--data_dir data/mnli \
--max_steps 10000 \
--from_pretrained ernie-2.0-en
many other demo python scripts:
- Sentiment Analysis
- Semantic Similarity
- Name Entity Recognition(NER)
- Machine Reading Comprehension
- Text generation
- Text classification with
paddle.static
API
recomended hyper parameters:
tasks | batch size | learning rate |
---|---|---|
CoLA | 32 / 64 (base) | 3e-5 |
SST-2 | 64 / 256 (base) | 2e-5 |
STS-B | 128 | 5e-5 |
QQP | 256 | 3e-5(base)/5e-5(large) |
MNLI | 256 / 512 (base) | 3e-5 |
QNLI | 256 | 2e-5 |
RTE | 16 / 4 (base) | 2e-5(base)/3e-5(large) |
MRPC | 16 / 32 (base) | 3e-5 |
WNLI | 8 | 2e-5 |
XNLI | 512 | 1e-4(base)/4e-5(large) |
CMRC2018 | 64 | 3e-5 |
DRCD | 64 | 5e-5(base)/3e-5(large) |
MSRA-NER(SIGHAN2006) | 16 | 5e-5(base)/1e-5(large) |
ChnSentiCorp | 24 | 5e-5(base)/1e-5(large) |
LCQMC | 32 | 2e-5(base)/5e-6(large) |
NLPCC2016-DBQA | 64 | 2e-5(base)/1e-5(large) |
VCR | 64 | 2e-5(base)/2e-5(large) |
see here
If --inference_model_dir
is passed to finetune_classifier_dygraph.py
,
a deployable model will be generated at the end of finetuning and your model is ready to serve.
For details about online inferece, see C++ inference API, or you can start a multi-gpu inference server with a few lines of codes:
python -m propeller.tools.start_server -m /path/to/saved/inference_model -p 8881
and call the server just like calling local function (python3 only):
from propeller.service.client import InferenceClient
from ernie.tokenizing_ernie import ErnieTokenizer
client = InferenceClient('tcp://localhost:8881')
tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
ids, sids = tokenizer.encode('hello world')
ids = np.expand_dims(ids, 0)
sids = np.expand_dims(sids, 0)
result = client(ids, sids)
A pre-made inference model
for ernie-1.0 can be downloaded at here.
It can be used for feature-based finetuning or feature extraction.
Knowledge distillation is good way to compress and accelerate ERNIE.
For details about distillation, see here
@article{sun2019ernie,
title={Ernie: Enhanced representation through knowledge integration},
author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Chen, Xuyi and Zhang, Han and Tian, Xin and Zhu, Danxiang and Tian, Hao and Wu, Hua},
journal={arXiv preprint arXiv:1904.09223},
year={2019}
}
@article{sun2019ernie20,
title={ERNIE 2.0: A Continual Pre-training Framework for Language Understanding},
author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Tian, Hao and Wu, Hua and Wang, Haifeng},
journal={arXiv preprint arXiv:1907.12412},
year={2019}
}
@article{xiao2020ernie-gen,
title={ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation},
author={Xiao, Dongling and Zhang, Han and Li, Yukun and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
journal={arXiv preprint arXiv:2001.11314},
year={2020}
}
@article{yu2020ernie,
title={ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph},
author={Yu, Fei and Tang, Jiji and Yin, Weichong and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
journal={arXiv preprint arXiv:2006.16934},
year={2020}
}
@article{xiao2020ernie,
title={ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding},
author={Xiao, Dongling and Li, Yu-Kun and Zhang, Han and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
journal={arXiv preprint arXiv:2010.12148},
year={2020}
}
@article{ding2020ernie,
title={ERNIE-DOC: The Retrospective Long-Document Modeling Transformer},
author={Ding, Siyu and Shang, Junyuan and Wang, Shuohuan and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
journal={arXiv preprint arXiv:2012.15688},
year={2020}
}
@article{li2020unimo,
title={UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning},
author={Li, Wei and Gao, Can and Niu, Guocheng and Xiao, Xinyan and Liu, Hao and Liu, Jiachen and Wu, Hua and Wang, Haifeng},
journal={arXiv preprint arXiv:2012.15409},
year={2020}
}
For full reproduction of paper results, please checkout to repro
branch of this repo.
- ERNIE homepage
- Github Issues: bug reports, feature requests, install issues, usage issues, etc.
- QQ discussion group: 760439550 (ERNIE discussion group).
- QQ discussion group: 958422639 (ERNIE discussion group-v2).
- Forums: discuss implementations, research, etc.