Skip to content

Commit

Permalink
Update XLNet Model (PaddlePaddle#93)
Browse files Browse the repository at this point in the history
* fix calculation of training steps when args.max_steps > 0, print the name of two dev datasets in mnli task

* add support for wnli task

* add three chinese xlnet models: chinese-xlnet-base, chinese-xlnet-mid, chinese-xlnet-large

* add reference of pretrained chinese xlnet models
  • Loading branch information
yingyibiao committed Mar 10, 2021
1 parent b6e5d37 commit 5a98c6b
Show file tree
Hide file tree
Showing 6 changed files with 398 additions and 234 deletions.
1 change: 1 addition & 0 deletions docs/model_zoo.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ PaddleNLP提供了丰富的模型结构,包含经典的RNN类模型结构,
| [ERNIESage](../examples/text_graph/erniesage)| ERNIESage(ERNIE SAmple aggreGatE) 通过Graph(图)来构建自身节点和邻居节点的连接关系,将自身节点和邻居节点的关系构建成一个关联样本输入到ERNIE中,ERNIE作为聚合函数 (Aggregators) 来表征自身节点和邻居节点的语义关系,最终强化图中节点的语义表示。|
| [GPT-2](../examples/language_model/gpt2) |[Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf) |
| [ELECTRA](../examples/language_model/electra/) | [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://arxiv.org/abs/2003.10555) |
| [XLNet](../examples/language_model/xlnet/) | [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) |
| [RoBERTa](../examples/text_classification/pretrained_models) | [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) |
| [PLATO-2](../examples/dialogue/plato-2) | 百度自研领先的开放域对话预训练模型 [PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning](https://arxiv.org/abs/2006.16779) |
| [SentenceBERT](../examples/text_matching/sentence_transformers)| [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084) |
Expand Down
8 changes: 4 additions & 4 deletions docs/transformers.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

## Transformer预训练模型汇总

下表汇总了目前PaddleNLP支持的各类预训练模型。用户可以使用PaddleNLP提供的模型,完成问答、文本分类、序列标注、文本生成等任务。同时我们提供了29种预训练的参数权重供用户使用,其中包含了12种中文语言模型的预训练权重
下表汇总了目前PaddleNLP支持的各类预训练模型。用户可以使用PaddleNLP提供的模型,完成问答、文本分类、序列标注、文本生成等任务。同时我们提供了32种预训练的参数权重供用户使用,其中包含了15种中文语言模型的预训练权重

| Model | Tokenizer | Supported Task | Pretrained Weight|
|---|---|---|---|
Expand All @@ -15,10 +15,10 @@
|[GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)| GPT2Tokenizer<br> GPT2ChineseTokenizer| GPT2ForGreedyGeneration| `gpt2-base-cn` <br> `gpt2-medium-en`|
|[RoBERTa](https://arxiv.org/abs/1907.11692)|RobertaTokenizer| RobertaModel<br>RobertaForQuestionAnswering<br>RobertaForSequenceClassification<br>RobertaForTokenClassification| `roberta-wwm-ext`<br> `roberta-wwm-ext-large`<br> `rbt3`<br> `rbtl3`|
|[ELECTRA](https://arxiv.org/abs/2003.10555) | ElectraTokenizer| ElectraModel<br>ElectraForSequenceClassification<br>ElectraForTokenClassification<br>|`electra-small`<br> `electra-base`<br> `electra-large`<br> `chinese-electra-small`<br> `chinese-electra-base`<br>|
|[XLNet](https://arxiv.org/abs/1906.08237)| XLNetTokenizer| XLNetModel<br> XLNetForSequenceClassification<br> XLNetForTokenClassification |`xlnet-base-cased`<br> `xlnet-large-cased`|
|[XLNet](https://arxiv.org/abs/1906.08237)| XLNetTokenizer| XLNetModel<br> XLNetForSequenceClassification<br> XLNetForTokenClassification |`xlnet-base-cased`<br> `xlnet-large-cased`<br> `chinese-xlnet-base`<br> `chinese-xlnet-mid`<br> `chinese-xlnet-large`|
|[Transformer](https://arxiv.org/abs/1706.03762) |- | TransformerModel | - |

**NOTE**:其中中文的预训练模型有`bert-base-chinese, bert-wwm-chinese, bert-wwm-ext-chinese, ernie-1.0, ernie-tiny, gpt2-base-cn, roberta-wwm-ext, roberta-wwm-ext-large, rbt3, rbtl3, chinese-electra-base, chinese-electra-small`
**NOTE**:其中中文的预训练模型有`bert-base-chinese, bert-wwm-chinese, bert-wwm-ext-chinese, ernie-1.0, ernie-tiny, gpt2-base-cn, roberta-wwm-ext, roberta-wwm-ext-large, rbt3, rbtl3, chinese-electra-base, chinese-electra-small, chinese-xlnet-base, chinese-xlnet-mid, chinese-xlnet-large`

## 预训练模型使用方法

Expand Down Expand Up @@ -73,7 +73,7 @@ for input_ids, token_type_ids, labels in train_dataloader:
用户可以切换表格中的不同模型,来处理相同类型的任务。如对于[预训练模型使用方法](#预训练模型使用方法)中的文本分类任务,用户可以将`BertForSequenceClassification`换成`ErnieForSequenceClassification`, 来寻找更适合的预训练模型。

## 参考资料:
- 部分中文预训练模型来自:https://github.com/ymcui/Chinese-BERT-wwm
- 部分中文预训练模型来自:https://github.com/ymcui/Chinese-BERT-wwm, https://github.com/ymcui/Chinese-XLNet, https://huggingface.co/clue/xlnet_chinese_large
- Sun, Yu, et al. "Ernie: Enhanced representation through knowledge integration." arXiv preprint arXiv:1904.09223 (2019).
- Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
- Cui, Yiming, et al. "Pre-training with whole word masking for chinese bert." arXiv preprint arXiv:1906.08101 (2019).
Expand Down
21 changes: 11 additions & 10 deletions examples/language_model/xlnet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
```shell
pip install paddlenlp\>=2.0.0rc
```

* SentencePiece 安装
```shell
pip install sentencepiece
Expand Down Expand Up @@ -63,13 +63,14 @@ python -m paddle.distributed.launch ./run_glue.py \

基于`xlnet-base-cased`在GLUE各评测任务上Fine-tuning后,在验证集上有如下结果:

| Task | Metric | Result |
| Task | Metric | Result |
|:-----:|:----------------------------:|:------------------:|
| SST-2 | Accuracy | 94.266 |
| QNLI | Accuracy | 91.708 |
| CoLA | Mattehew's corr | 50.264 |
| MRPC | F1/Accuracy | 91.071/87.745 |
| STS-B | Person/Spearman corr | 86.243/85.973 |
| QQP | Accuracy/F1 | 90.838/87.644 |
| MNLI | Matched acc/MisMatched acc | 87.468/86.859 |
| RTE | Accuracy | 70.036 |
| SST-2 | Accuracy | 94.266 |
| QNLI | Accuracy | 91.708 |
| CoLA | Mattehew's corr | 50.264 |
| MRPC | F1/Accuracy | 91.071/87.745 |
| STS-B | Person/Spearman corr | 86.243/85.973 |
| QQP | Accuracy/F1 | 90.838/87.644 |
| MNLI | Matched acc/MisMatched acc | 87.468/86.859 |
| RTE | Accuracy | 70.036 |
| WNLI | Accuracy | 56.338 |
19 changes: 16 additions & 3 deletions examples/language_model/xlnet/run_glue.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import os
import random
import time
from math import ceil
from functools import partial

import numpy as np
Expand All @@ -41,6 +42,7 @@
"mnli": Accuracy,
"qnli": Accuracy,
"rte": Accuracy,
"wnli": Accuracy,
}


Expand Down Expand Up @@ -151,6 +153,7 @@ def do_train(args):
paddle.distributed.init_parallel_env()

set_seed(args)
global final_res

args.task_name = args.task_name.lower()
metric_class = METRIC_CLASSES[args.task_name]
Expand Down Expand Up @@ -223,8 +226,12 @@ def do_train(args):
if paddle.distributed.get_world_size() > 1:
model = paddle.DataParallel(model)

num_training_steps = args.max_steps if args.max_steps > 0 else (
len(train_data_loader) * args.num_train_epochs)
if args.max_steps > 0:
num_training_steps = args.max_steps
num_train_epochs = ceil(num_training_steps / len(train_data_loader))
else:
num_training_steps = len(train_data_loader) * args.num_train_epochs
num_train_epochs = args.num_train_epochs

warmup = args.warmup_steps if args.warmup_steps > 0 else args.warmup_proportion
lr_scheduler = LinearDecayWithWarmup(args.learning_rate, num_training_steps,
Expand Down Expand Up @@ -255,7 +262,7 @@ def do_train(args):
global_step = 0
tic_train = time.time()
model.train()
for epoch in range(args.num_train_epochs):
for epoch in range(num_train_epochs):
for step, batch in enumerate(train_data_loader):
global_step += 1
input_ids, token_type_ids, attention_mask, labels = batch
Expand All @@ -277,9 +284,14 @@ def do_train(args):
if global_step % args.save_steps == 0 or global_step == num_training_steps:
tic_eval = time.time()
if args.task_name == "mnli":
print("matched ", end="")
evaluate(model, loss_fct, metric, dev_data_loader_matched)
final_res1 = "matched " + final_res
print("mismatched ", end="")
evaluate(model, loss_fct, metric,
dev_data_loader_mismatched)
final_res2 = "mismatched " + final_res
final_res = final_res1 + "\r\n" + final_res2
print("eval done total : %s s" % (time.time() - tic_eval))
else:
evaluate(model, loss_fct, metric, dev_data_loader)
Expand All @@ -297,6 +309,7 @@ def do_train(args):
tokenizer.save_pretrained(output_dir)
if global_step == num_training_steps:
print(final_res)
exit(0)
tic_train += time.time() - tic_eval


Expand Down
Loading

0 comments on commit 5a98c6b

Please sign in to comment.