[Unittest] Add unittest for RoBERTa, ALBERT and ERNIE #2972

yingyibiao · 2022-08-05T07:26:23Z

PR types

unittest

PR changes

Models

Description

Add unittest for RoBERTa and ALBERT

guoshengCS

另外看有些新增的test文件是空的，是还没有完成是吗

guoshengCS · 2022-08-08T02:50:20Z

paddlenlp/transformers/gpt/tokenizer.py

@@ -406,6 +407,8 @@ def __init__(
        bpe_merges = [tuple(merge.split()) for merge in bpe_data]
        self.bpe_ranks = dict(zip(bpe_merges, range(len(bpe_merges))))
        self.cache = {}
+        self.add_prefix_space = add_prefix_space
+


看着并不会被用到，看HF是否这里少了对prepare_for_tokenization方法的重写

guoshengCS · 2022-08-08T02:56:49Z

paddlenlp/transformers/gpt/tokenizer.py

@@ -363,6 +363,7 @@ def __init__(
            merges_file,
            errors='replace',
            max_len=None,
+            add_prefix_space=False,


另外这个参数也放到special token后面吧，一是和HF一致，二是更好的照顾兼容性

guoshengCS · 2022-08-08T03:24:47Z

paddlenlp/transformers/roberta/tokenizer.py

                 unk_token="<unk>",
                 pad_token="<pad>",
                 mask_token="<mask>",
+                 add_prefix_space=False,
+                 max_len=None,
+                 special_tokens=None,


max_len和special_tokens看着都没有使用，而且HF也没有，没有用的话就去掉吧

paddlenlp/transformers/roberta/tokenizer.py

guoshengCS · 2022-08-08T03:30:39Z

tests/transformers/albert/test_modeling.py

+    AlbertModel,
+)
+from tests.transformers.test_modeling_common import ids_tensor, random_attention_mask, ModelTesterMixin
+from tests.testing_utils import slow


按照上次统一的结论，对tests下的内容使用相对import

guoshengCS · 2022-08-11T10:24:40Z

paddlenlp/transformers/tokenizer_utils.py

+                            for t in text)))
+                    return self.convert_tokens_to_ids(tokens)
+                else:
+                    return self.convert_tokens_to_ids(text)


这里之前没改留了TODO好像就是因为兼容性问题，还得关注下这里是否有CI挂的，之前是skep序列标注任务训练和预测报错 #2063

guoshengCS · 2022-08-12T12:44:15Z

tests/transformers/test_tokenizer_common.py

-                             **tokenizer.added_tokens_encoder)
+                vocab = tokenizer.get_vocab()
+                # vocab = dict(tokenizer.vocab._token_to_idx,
+                #              **tokenizer.added_tokens_encoder)


删掉这个吧，不用了是吧，上面那样也是这样的就删掉吧

guoshengCS · 2022-08-12T13:05:44Z

看CI里这个制品详情里只有一个，这个是否是符合预期的呢 @zjjlivein

yingyibiao added 5 commits August 5, 2022 15:23

add roberta tokenizer unittest

f7b3309

add roberta tokenizer unittest

0b96fec

add roberta modeling unittest

a72ce16

Merge branch 'develop' into test

84bf0ec

add unittest for albert modeling

3625ab3

yingyibiao requested a review from guoshengCS August 6, 2022 01:27

guoshengCS reviewed Aug 8, 2022

View reviewed changes

yingyibiao added 2 commits August 10, 2022 19:21

add albert unittest

1b10975

Merge branch 'develop' into test

c374537

yingyibiao requested a review from guoshengCS August 10, 2022 11:24

yingyibiao changed the title ~~Add unittest for RoBERTa~~ Add unittest for RoBERTa and ALBERT Aug 10, 2022

add ernie unittest

fb2d480

yingyibiao changed the title ~~Add unittest for RoBERTa and ALBERT~~ [Unittest] Add unittest for RoBERTa, ALBERT and ERNIE Aug 11, 2022

yingyibiao added 8 commits August 11, 2022 11:03

Merge branch 'develop' into test

2c32f7d

Merge branch 'test' of github.com:yingyibiao/PaddleNLP into test

b515bae

upgrade unittest for albert tokenizer

f6d6516

upgrade unittest for ernie tokenizer

c24d765

Merge branch 'develop' into test

ef78082

Merge branch 'develop' into test

10120aa

Merge branch 'develop' into test

014e916

Merge branch 'develop' into test

23d760b

guoshengCS reviewed Aug 12, 2022

View reviewed changes

yingyibiao added 4 commits August 18, 2022 11:16

Merge branch 'develop' into test

a0d9ea7

fix unittest for ernie tokenizer

6484bd5

Merge branch 'develop' into test

fe4b4bb

Merge branch 'develop' into test

ed2a3f1

yingyibiao requested a review from guoshengCS August 18, 2022 11:49

guoshengCS approved these changes Aug 18, 2022

View reviewed changes

yingyibiao merged commit 869dd96 into PaddlePaddle:develop Aug 18, 2022

yingyibiao deleted the test branch August 18, 2022 11:53

w5688414 mentioned this pull request Aug 24, 2022

PaddleNLP 2.3.6 Release Note Candidate #3122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unittest] Add unittest for RoBERTa, ALBERT and ERNIE #2972

[Unittest] Add unittest for RoBERTa, ALBERT and ERNIE #2972

yingyibiao commented Aug 5, 2022 •

edited

Loading

guoshengCS left a comment

guoshengCS Aug 8, 2022

yingyibiao Aug 10, 2022

guoshengCS Aug 8, 2022

yingyibiao Aug 10, 2022

guoshengCS Aug 8, 2022

yingyibiao Aug 10, 2022

guoshengCS Aug 8, 2022

yingyibiao Aug 10, 2022

guoshengCS Aug 11, 2022

guoshengCS Aug 12, 2022

yingyibiao Aug 18, 2022

guoshengCS commented Aug 12, 2022

[Unittest] Add unittest for RoBERTa, ALBERT and ERNIE #2972

[Unittest] Add unittest for RoBERTa, ALBERT and ERNIE #2972

Conversation

yingyibiao commented Aug 5, 2022 • edited Loading

PR types

PR changes

Description

guoshengCS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guoshengCS commented Aug 12, 2022

yingyibiao commented Aug 5, 2022 •

edited

Loading