Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Add ernie-1.0-base-zh-cw for clue benchmark. #3248

Merged
merged 2 commits into from
Sep 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 41 additions & 2 deletions examples/benchmark/clue/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@
</tr> <tr>
<td rowspan=3 align=center> 24L1024H </td>
<td style="text-align:center">
<span style="font-size:18px">ERNIE 1.0-Large-zh-CW</span>
<span style="font-size:18px">ERNIE 1.0-Large-zh-cw</span>
</td>
<td style="text-align:center">
<span style="font-size:18px"><b>79.03</b></span>
Expand Down Expand Up @@ -222,7 +222,7 @@
</td>
</tr>
<tr>
<td rowspan=8 align=center> 12L768H </td>
<td rowspan=9 align=center> 12L768H </td>
<td style="text-align:center">
<span style="font-size:18px">
<a href="https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_base_zh.pdparams">
Expand Down Expand Up @@ -264,6 +264,44 @@
<span style="font-size:18px"><b>77.88</b></span>
</td>
</tr>
<tr>
<td style="text-align:center">
<span style="font-size:18px">ERNIE 1.0-Base-zh-cw</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">76.47</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">76.07</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">57.86</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">59.91</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">83.41</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">79.58</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">89.91</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">83.42</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">72.88/90.78</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">84.68</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">76.98</span>
</td>
</tr>
<tr>
<td style="text-align:center">
<span style="font-size:18px">ERNIE-Gram-zh</span>
Expand Down Expand Up @@ -1196,6 +1234,7 @@ AFQMC(语义相似度)、TNEWS(文本分类)、IFLYTEK(长文本分类
| ERNIE 2.0-Large-zh | 1e-5,32 | 3e-5,64 | 3e-5,32 | 2e-5,32 | 1e-5,16 | 3e-5,32 | 1e-5,64 | 2e-5,24 | 2e-5,24 | 3e-5,32 |
| HFL/RoBERTa-wwm-ext-large | 1e-5,32 | 3e-5,32 | 2e-5,32 | 1e-5,16 | 1e-5,16 | 2e-5,16 | 2e-5,16 | 3e-5,32 | 1e-5,24 | 2e-5,24 |
| ERNIE 3.0-Base-zh | 3e-5,16 | 3e-5,32 | 5e-5,32 | 3e-5,32 | 2e-5,64 | 2e-5,16 | 2e-5,32 | 2e-5,24 | 3e-5,24 | 3e-5,32 |
| ERNIE 1.0-Base-zh-cw | 2e-5,16 | 3e-5,32 | 5e-5,16 | 2e-5,16 | 3e-5,32 | 2e-5,16 | 2e-5,32 | 3e-5,24 | 2e-5,32 | 3e-5,24 |
| ERNIE-Gram-zh | 1e-5,16 | 5e-5,16 | 5e-5,16 | 2e-5,32 | 2e-5,64 | 3e-5,16 | 3e-5,64 | 3e-5,32 | 2e-5,24 | 2e-5,24 |
| ERNIE 2.0-Base-zh | 3e-5,64 | 3e-5,64 | 5e-5,16 | 5e-5,64 | 5e-5,32 | 5e-5,16 | 2e-5,16 | 2e-5,32 | 3e-5,24 | 3e-5,32 |
| Langboat/Mengzi-Bert-Base | 3e-5,32 | 5e-5,32 | 5e-5,16 | 2e-5,16 | 2e-5,16 | 3e-5,8 | 1e-5,16 | 3e-5,24 | 3e-5,24 | 2e-5,32 |
Expand Down
8 changes: 4 additions & 4 deletions model_zoo/ernie-1.0/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -484,24 +484,24 @@ python3 -u -m paddle.distributed.launch \

我们release了base、large两个模型。均取得了较好的预训练效果。

- **ERNIE 1.0-Base-zh-CW** 模型:
- **ERNIE 1.0-Base-zh-cw** 模型:
- 使用CLUE,WuDao共计400GB的语料,batch_size 1024, 训练 400w step,即可训练得到`ernie-3.0-base-zh`类似的模型效果。相关模型参数,开源为`ernie-1.0-base-zh-cw`,用户加载即可使用。使用CLUE benchmark 对最优超参数进行GradSearch搜索:

Model&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Arch | CLUE AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUE WSC2020 | CSL | CMRC | CHID | C3
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
Metrics |   |   | Acc | Acc | Acc | Acc | Acc | Acc | Acc | Exact/F1| Acc| Acc | Acc
ERNIE 1.0-Base-zh-CW | 12L768H | <b>76.44</b> | 76.04 | 58.02 | 60.87 | 83.56 | 78.61 | 89.14 | 84.00 | 72.26/90.40 | 84.73 | 77.15 |
ERNIE 1.0-Base-zh-cw | 12L768H | <b>76.47</b> | 76.07 | 57.86 | 59.91 | 83.41 | 79.91 | 89.91 | <b>83.42</b> | 72.88/90.78 | <b>84.68</b> | 76.98 |
ERNIE 2.0-Base-zh | 12L768H | 74.95 | 76.25 | 58.53 | 61.72 | 83.07 | 78.81 | 84.21 | 82.77 | 68.22/88.71 | 82.78 | 73.19
ERNIE 1.0-Base-zh | 12L768H | 74.17 | 74.84 | 58.91 | 62.25 | 81.68 | 76.58 | 85.20 | 82.77 | 67.32/87.83 | 82.47 | 69.68
-
- **ERNIE 1.0-Large-zh-CW** 模型:
- **ERNIE 1.0-Large-zh-cw** 模型:

- 除了base模型外,我们还训练了放出了large模型。此模型参数采用的是词表与ernie-1.0相同,因此命名为`ernie-1.0-large-zh-cw`。使用开源语料,batch_size 512, 训练 400w step,训练去除SOP任务,只保留MLM损失:

Model&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Arch | CLUE AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUE WSC2020 | CSL | CMRC | CHID | C3
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
Metrics |   |   | Acc | Acc | Acc | Acc | Acc | Acc | Acc | Exact/F1 | Acc| Acc
ERNIE 1.0-Large-zh-CW| 24L1024H | <b>79.03</b> | 75.97 | 59.65 | 62.91 | 85.09 | 81.73| 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54
ERNIE 1.0-Large-zh-cw | 24L1024H | <b>79.03</b> | 75.97 | 59.65 | 62.91 | 85.09 | 81.73| 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54
ERNIE 3.0-Xbase-zh| 20L1024H | 78.71 | 76.85 | 59.89 | 62.41 | 84.76 | 82.51 | 89.80 | 84.47 | 75.49/92.67 | 86.36 | 84.59
RoBERTa-wwm-ext-large | 24L1024H | 76.61 | 76.00 | 59.33 | 62.02 | 83.88 | 78.81 | 90.79 | 83.67 | 70.58/89.82 | 85.72 | 75.26

Expand Down
12 changes: 6 additions & 6 deletions model_zoo/ernie-1.0/pretraining_introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ PaddleNLP致力于预训练开源工作,使用开源中文语料CLUE、WuDao
- [3.4 训练数据流配置](#data_pipe)
- [3.5 观察评估](#观察评估)
- [4. 训练效果](#release_models)
- [4.1 ERNIE 1.0-Base-zh-CW 模型](#ernie-1.0-base-zh-cw)
- [4.2 ERNIE 1.0-Large-zh-CW 模型](#ernie-1.0-large-zh-cw)
- [4.1 ERNIE 1.0-Base-zh-cw 模型](#ernie-1.0-base-zh-cw)
- [4.2 ERNIE 1.0-Large-zh-cw 模型](#ernie-1.0-large-zh-cw)
* [5. 参考](#references)

全部流程介绍图如下:
Expand Down Expand Up @@ -577,28 +577,28 @@ python3 -u -m paddle.distributed.launch \

<a name="ernie-1.0-base-zh-cw"></a>

### 4.1 ERNIE 1.0-Base-zh-CW 模型
### 4.1 ERNIE 1.0-Base-zh-cw 模型

使用CLUE,WuDao共计400GB的语料,batch_size 1024, 训练 400w step,即可训练得到`ernie-3.0-base-zh`类似的模型效果。相关模型参数,开源为`ernie-1.0-base-zh-cw`,用户加载即可使用。使用CLUE benchmark 对最优超参数进行GradSearch搜索:

Model&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Arch | CLUE AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUE WSC2020 | CSL | CMRC | CHID | C3
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
Metrics |   |   | Acc | Acc | Acc | Acc | Acc | Acc | Acc | Exact/F1| Acc| Acc
ERNIE 1.0-Base-zh-CW | 12L768H | <b>76.44</b> | 76.04 | 58.02 | 60.87 | 83.56 | 78.61 | 89.14 | 84.00 | 72.26/90.40 | 84.73 | 77.15 |
ERNIE 1.0-Base-zh-cw | 12L768H | <b>76.47</b> | 76.04 | 57.86 | 59.91 | <b>83.41</b> | 79.58 | 89.91 | 83.42 | 72.88/90.78 | <b>84.68</b> | 76.98 |
ERNIE 2.0-Base-zh | 12L768H | 74.32 | 75.65 | 58.25 | 61.64 | 82.62 | 78.71 | 81.91 | 82.33 | 66.08/87.46 | 82.78 | 73.19
ERNIE 1.0-Base-zh | 12L768H | 74.17 | 74.84 | 58.91 | 62.25 | 81.68 | 76.58 | 85.20 | 82.77 | 67.32/87.83 | 82.47 | 69.68


<a name="ernie-1.0-large-zh-cw"> </a>

### 4.2 ERNIE 1.0-Large-zh-CW 模型
### 4.2 ERNIE 1.0-Large-zh-cw 模型

除了base模型外,我们还训练了large模型。命名为`ernie-1.0-large-zh-cw`。使用开源语料,batch_size 512, 训练 400w step,训练去除SOP任务,只保留MLM损失,使用CLUE benchmark 对最优超参数进行GradSearch搜索:

Model&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Arch | CLUE AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUE WSC2020 | CSL | CMRC | CHID | C3
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
Metrics |   |   | Acc | Acc | Acc | Acc | Acc | Acc | Acc | Exact/F1 | Acc| Acc
ERNIE 1.0-Large-zh-CW| 24L1024H | <b>79.03</b> | 75.97 | 59.65 | 62.91 | 85.09 | 81.73| 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54
ERNIE 1.0-Large-zh-cw| 24L1024H | <b>79.03</b> | 75.97 | 59.65 | 62.91 | 85.09 | 81.73| 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54
ERNIE 3.0-Xbase-zh| 20L1024H | 78.39 | 76.16 | 59.55 | 61.87 | 84.40 | 81.73 | 88.82 | 83.60 | 75.99/93.00 | 86.78 | 84.98
RoBERTa-wwm-ext-large | 24L1024H | 76.61 | 76.00 | 59.33 | 62.02 | 83.88 | 78.81 | 90.79 | 83.67 | 70.58/89.82 | 85.72 | 75.26

Expand Down
7 changes: 5 additions & 2 deletions model_zoo/ernie-1.0/run_pretrain.py
Original file line number Diff line number Diff line change
Expand Up @@ -541,8 +541,11 @@ def do_train(args):
ctx_manager = contextlib.nullcontext() if sys.version_info >= (
3, 7) else contextlib.suppress()

if worker_num > 1 and (args.use_recompute
or args.accumulate_steps > 1):
if worker_num > 1 and (args.use_recompute or
((step + 1) % args.accumulate_steps != 0)):
# grad acc, no_sync when (step + 1) % args.accumulate_steps != 0:
# recompute, no_sync every where
# recompute + grad_acc, no_sync every where
ctx_manager = model.no_sync()
else:
ctx_manager = contextlib.nullcontext() if sys.version_info >= (
Expand Down
42 changes: 40 additions & 2 deletions model_zoo/ernie-3.0/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ batch_size=32 和 1,预测精度为 FP16 时,GPU 下的效果-时延图:
<tr>
<td rowspan=3 align=center> 24L1024H </td>
<td style="text-align:center">
<span style="font-size:18px">ERNIE 1.0-Large-CW</span>
<span style="font-size:18px">ERNIE 1.0-Large-cw</span>
</td>
<td style="text-align:center">
<span style="font-size:18px"><b>79.03</b></span>
Expand Down Expand Up @@ -291,7 +291,7 @@ batch_size=32 和 1,预测精度为 FP16 时,GPU 下的效果-时延图:
</td>
</tr>
<tr>
<td rowspan=8 align=center> 12L768H </td>
<td rowspan=9 align=center> 12L768H </td>
<td style="text-align:center">
<span style="font-size:18px">
<a href="https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_base_zh.pdparams">
Expand Down Expand Up @@ -333,6 +333,44 @@ batch_size=32 和 1,预测精度为 FP16 时,GPU 下的效果-时延图:
<span style="font-size:18px"><b>77.88</b></span>
</td>
</tr>
<tr>
<td style="text-align:center">
<span style="font-size:18px">ERNIE 1.0-Base-zh-cw</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">76.47</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">76.07</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">57.86</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">59.91</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">83.41</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">79.58</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">89.91</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">83.42</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">72.88/90.78</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">84.68</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">76.98</span>
</td>
</tr>
<tr>
<td style="text-align:center">
<span style="font-size:18px">ERNIE-Gram-zh</span>
Expand Down