[IPU] add bert-base model for ipu #1793

gglin001 · 2022-03-18T06:04:47Z

PR types

Others

PR changes

Models

Description

添加 BERT-base 模型在 IPU 上运行的支持, 包含 pretrainning 和 SQuAD 两个 task.
模型使用静态图构建, 使用半精度训练 ,最终性能和精度运行结果如下:

Task	Metric	Result
Phase1	MLM Loss	1.6064
	NSP Loss	0.0272
	MLM Acc	0.6689
	NSP Acc	0.9897
	tput	11700
Phase2	MLM Loss	1.5029
	NSP Loss	0.02444
	MLM Acc	0.68555
	NSP Acc	0.99121
	tput	3470
SQuAD	EM	79.9053
	F1	87.6396

全部的代码放在了 examples/language_model/bert/static_ipu/ 目录下, 文件详情请参考 README.md

主要修改:

为了实现在 IPU 上训练的最佳性能, 模型构图部分使用了自定义的 modeling.py
添加了部分 ipu 的自定义算子用于构建模型(主要为了性能方面的考虑)
由于需要对模型的输入做 remask 操作, 使用单一线程载入数据集会导致数据载入耗时比较长, 影响 end2end 的 throughput, dataset 部分使用了自定义的dataloader, 使用了多个进程做 remask 操作, 见 dataset_ipu.py
load_tf_ckpt.py 用于映射 Google发布的 BERT pretrain weight https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip

PR主要用于评估代码是否适合放在 paddlenlp 仓库中, 麻烦 reviewer 提出宝贵意见 😊

ZeyuChen

LGTM，Thanks contribution from GraphCore. @guoshengCS FYI

ZeyuChen · 2022-03-18T07:54:52Z

@guoshengCS We can add this examples as PaddleNLP 2.3 feature.

gglin001 · 2022-04-01T09:01:47Z

请问对于这个 PR 有任何建议或者合入的计划吗? @guoshengCS

guoshengCS

LGTM

guoshengCS · 2022-04-02T10:19:34Z

很赞的工作，感谢贡献，抱歉有所耽搁~ @gglin001

gglin001 · 2022-04-02T10:27:59Z

很赞的工作，感谢贡献，抱歉有所耽搁~ @gglin001

谢谢~~

接下来我们会在新的 SDK 上进行测试, 可能还会有一些更新加入进来哈, 到时麻烦继续 review 下 😉

* add bert-base model for ipu * use hdf5 dataset * add enable_engine_caching param * use HF dataset for squad task * update readme

gglin001 added 2 commits March 18, 2022 10:52

add bert-base model for ipu

f5d1362

Merge branch 'develop' into add_bert_base_on_ipu

c9c08c7

ZeyuChen previously approved these changes Mar 18, 2022

View reviewed changes

ZeyuChen assigned ZeyuChen and guoshengCS Mar 18, 2022

ZeyuChen added this to the PaddleNLP v2.3 milestone Mar 18, 2022

ZeyuChen added the contributions label Mar 18, 2022

gglin001 and others added 6 commits March 21, 2022 09:40

Merge branch 'develop' into add_bert_base_on_ipu

4e6b9fb

Merge branch 'develop' into add_bert_base_on_ipu

93f5943

Merge branch 'develop' into add_bert_base_on_ipu

0a411b2

Merge branch 'develop' into add_bert_base_on_ipu

36b9436

use hdf5 dataset

d674e7e

Merge branch 'develop' into add_bert_base_on_ipu

cf13496

gglin001 dismissed ZeyuChen’s stale review via cf13496 March 23, 2022 08:15

gglin001 and others added 10 commits March 24, 2022 11:04

add enable_engine_caching param

4182808

Merge branch 'develop' into add_bert_base_on_ipu

2cb8ebe

Merge branch 'develop' into add_bert_base_on_ipu

5c5e724

Merge branch 'develop' into add_bert_base_on_ipu

bb3b6c5

Merge branch 'develop' into add_bert_base_on_ipu

27fd372

use HF dataset for squad task

df717ff

Merge branch 'develop' into add_bert_base_on_ipu

c0a5ffb

update readme

a72e376

Merge branch 'develop' into add_bert_base_on_ipu

f67998c

Merge branch 'develop' into add_bert_base_on_ipu

695ee3d

gglin001 added 2 commits April 1, 2022 17:01

Merge branch 'develop' into add_bert_base_on_ipu

5235a5d

Merge branch 'develop' into add_bert_base_on_ipu

51b0910

guoshengCS approved these changes Apr 2, 2022

View reviewed changes

guoshengCS merged commit 2000ea2 into PaddlePaddle:develop Apr 2, 2022

gglin001 deleted the add_bert_base_on_ipu branch April 2, 2022 10:28

ZeyuChen pushed a commit to ZeyuChen/PaddleNLP that referenced this pull request Apr 17, 2022

[IPU] add bert-base model for ipu (PaddlePaddle#1793)

8fd9abc

* add bert-base model for ipu * use hdf5 dataset * add enable_engine_caching param * use HF dataset for squad task * update readme

guoshengCS mentioned this pull request Apr 29, 2022

PaddleNLP v2.3rc Release Note Candidate #2031

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IPU] add bert-base model for ipu #1793

[IPU] add bert-base model for ipu #1793

gglin001 commented Mar 18, 2022 •

edited

Loading

ZeyuChen left a comment

ZeyuChen commented Mar 18, 2022

gglin001 commented Apr 1, 2022

guoshengCS left a comment

guoshengCS commented Apr 2, 2022 •

edited

Loading

gglin001 commented Apr 2, 2022

[IPU] add bert-base model for ipu #1793

[IPU] add bert-base model for ipu #1793

Conversation

gglin001 commented Mar 18, 2022 • edited Loading

PR types

PR changes

Description

ZeyuChen left a comment

Choose a reason for hiding this comment

ZeyuChen commented Mar 18, 2022

gglin001 commented Apr 1, 2022

guoshengCS left a comment

Choose a reason for hiding this comment

guoshengCS commented Apr 2, 2022 • edited Loading

gglin001 commented Apr 2, 2022

gglin001 commented Mar 18, 2022 •

edited

Loading

guoshengCS commented Apr 2, 2022 •

edited

Loading